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1 Introduction 

An important feature of modern science and engineering is that data of various kinds is being produced at 
an unprecedented rate. This is so in part because of new experimental methods, and in part because of the 
increase in the availability of high powered computing technology. It is also clear that the nature of the data 
we are obtaining is significantly different. For example, it is now often the case that we are given data in the 
form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions 
of interest, and further that we don’t necessarily know which coordinates are the interesting ones. A related 
fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it. The 
data obtained is also often much noisier than in the past, and has more missing information (missing data). 
This is particularly so in the case of biological data, particularly high throughput data from microarray or 
other sources. Our ability to analyze this data, both in terms of quantity and the nature of the data, is 
clearly not keeping pace with the data being produced. In this paper, we will discuss how geometry and 
topology can be applied to make useful contributions to the analysis of various kinds of data. Geometry and 
topology are very natural tools to apply in this direction, since geometry can be regarded as the study of 
distance functions, and what one often works with are distance functions on large finite sets of data. The 
mathematical formalism which has been developed for incorporating geometric and topological techniques 
deals with point clouds, i.e. finite sets of points equipped with a distance function. It then adapts tools 
from the various branches of geometry to the study of point clouds. The point clouds are intended to be 
thought of as finite samples taken from a geometric object, perhaps with noise. Here are some of the key 
points which come up when applying these geometric methods to data analysis. 


Qualitative information is needed: One important goal of data analysis is to allow the user to 
obtain knowledge about the data, i.e. to understand how it is organized on a large scale. For example, 
if we imagine that we are looking at a data set constructed somehow from diabetes patients, it would 
be important to develop the understanding that there are two types of the disease, namely the juvenile 
and adult onset forms. Once that is established, one of course wants to develop quantitative methods 
for distinguishing them, but the first insight about the distinct forms of the disease is key. 

Metrics are not theoretically justified: In physics, the phenomena studied often support clean 
explanatory theories which tell one exactly what metric to use. In biological problems, on the other 
hand, this is much less clear. In the biological context, notions of distance are constructed using some 
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intuitively attractive measures of similarity (such as BLAST scores or their relatives), but it is far from 
clear how much significance to attach to the actual distances, particularly at large scales. 

• Coordinates are not natural: Although we often receive data in the form of vectors of real numbers, 
it is frequently the case that the coordinates, like the metrics mentioned above, are not natural in any 
sense, and that therefore we should not restrict ourselves to studying properties of the data which 
depend on any particular choice of coordinates. Note that the variation of choices of coordinates 
does not require that the coordinate changes be rigid motions of Euclidean space. It is often a tacit 
assumption in the study of data that the coordinates carry more intrinsic meaning than they actually 
do. 

• Summaries are more valuable than individual parameter choices: One method of clustering 
a point cloud is the so-called single linkage clustering, in which a graph is constructed whose vertex set 
is the set of points in the cloud, and where two such points are connected by an edge if their distance 
is < e, where e is a parameter. Some work in clustering theory has been done in trying to determine 
the optimal choice of e, but it is now well understood that it is much more informative to maintain the 
entire dendrogram of the set, which provides a summary of the behavior of clustering under all possible 
values of the parameter e at once. It is therefore productive to develop other mechanisms in which the 
behavior of invariants or construction under change of parameters can be effectively summarized. 

In this paper, we will discuss methods for dealing with the properties and problems mentioned above. The 
underlying idea is that methods inspired by topology should address them. For each of the points above, we 
describe why topological methods are appropriate for dealing with them. 

• Topology is exactly that branch of mathematics which deals with qualitative geometric information. 
This includes the study of what the connected components of a space are, but more generally it is 
the study of connectivity information, which includes the classification of loops and higher dimensional 
surfaces within the space. This suggests that extensions of topological methodologies, such as homology, 
to point clouds should be helpful in studying them qualitatively. 

• Topology studies geometric properties in a way which is much less sensitive to the actual choice of 
metrics than straightforward geometric methods, which involve sensitive geometric properties such as 
curvature. In fact, topology ignores the quantitative values of the distance functions, and replaces it 
with the notion of infinite nearness of a point to a subset in the underlying space. This insensitivity 
to the metric is useful in studying situations where one only believes one understands the metric in a 
coarse way. 

• Topology studies only properties of geometric objects which do not depend on the chosen coordinates, 
but rather on intrinsic geometric properties of the objects. As such, it is coordinate-free. 

• The idea of constructing summaries over whole domains of parameter values involves understanding 
the relationship between geometric objects constructed from data using various parameter values. The 
relationships which are useful involve continuous maps between the different geometric objects, and 
therefore become a manifestation of the notion of functoriality, i.e the notion that invariants should 
be related not just to objects being studied, but also to the maps between these objects. Functoriality 
is central in algebraic topology, in that the functoriality of homological invariants is what permits 
one to compute them from local information, and that functoriality is at the heart of most of the 
interesting applications within mathematics. Moreover, it is understood that most of the information 
about topological spaces can be obtained through diagrams of discrete sets, via a process of simplicial 
approximation. 

The last point above, concerning functoriality, is critical. In developing methods to address the first two 
points, we find that we are forced to make functorial geometric constructions and analyze their behavior on 
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maps even to obtain information about single point clouds. Functoriality has proven itself to be a powerful 
tool in the development of various parts of mathematics, such as Galois theory within algebra, the theory of 
Fourier series within harmonic analysis, and the applicaton of algebraic topology to fixed point questions in 
topology. We argue that, as suggested in [46], it has a role to play in the study of point cloud data as well, 
and we give two illustrations of how this could happen, within the context of clustering. 

Informally, clustering refers to the process of partitioning a set of data into a number of parts or clusters, 
which are recognizably distinguishable from each other. In the context of finite metric spaces, this means 
roughly that points within the clusters are nearer to each other than they are to points in different clusters. 
Clustering should be thought of as the statistical counterpart to the geometric construction of the path- 
connected components of a space, which is the fundamental building block upon which algebraic topology 
is based. There are many schemes which construct clusterings based on metric information, such as single, 
average, and complete linkage clustering, /e-means clustering, spectral clustering, etc. (see [31]). Although 
clustering is clearly a very important part of data analysis, the ways in which it is formulated and implemented 
are fraught with ambiguities. In particular, the arbitrariness of various threshhold choices and lack of 
robustness are difficulties one confronts. Much of current research efforts are focused in this direction (see 
e.g. [43] and [39]), and functoriality provides the right general mathematical framework for addressing 
them. For example, one can construct data sets which have been threshholded at two different values, and 
the behavior of clusters under the inclusion of the set with tighter threshhold into the one with the looser 
threshhold is informative about what is happening in the data set. We present two additional examples of 
how functoriality could be used in analyzing some questions related to clustering. 

Example: In the case of very large X, it may often be difficult to apply the clustering algorithm to a full 
data set, and one may instead find it desirable to cluster subsamples from X. One is then confronted with 
the task of attempting to verify that the clustering of the subsample is actually representative of a clustering 
of the full data set X. One way of proceeding is to construct two samples from X, and hoping that they are 
consistent in an appropriate sense. One version of this idea would be to consider the subsamples Xi and X2, 
together with their union X\ U X 2 . One could apply the clustering scheme to each of these sets individually, 
and suppose we denote the set of clusters for the three sets Xi, X 2 , and X- t U X 2 by C(X 1), C(X 2 ), and 
C(X 1 UX 2 ) respectively. If the clustering scheme were functorial, i.e. if inclusions of data sets induced maps 
of the collections of clusters, then one would have a diagram of sets 


C(X 1 U X 2 ) 



C(X 1 ) C(X 2 ) 


If the clusterings are consistent, i.e. if the clusters in C(X 1) and C(X 2 ) in C(X-\ U X 2 ) correspond well 
under these maps, one can regard that as evidence that the subsample clusterings actually correspond to 
clusterings on the full data set X. Of course, what the phrase “correspond well” means is not well defined 
here. Later in the paper, we will discuss a way to attach more quantitative information to questions of this 
type. 

Example: Suppose that we have data X which varies with time. One could then ask for information 
concerning the behavior of clusterings produced by clusterings over time. Clusters can appear, vanish, 
merge, or split into separate clusters. The analysis of this behavior can be studied using functoriality. For 
to < ti, we let X[to , t-\] denote the set of points in the data set occurring between times to and t\. If we have 
t 0 <h <t 2 < t 3 , then we have a diagram of point cloud data sets 
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X[to,tit\ 


X[h,t 3 ] 


If the clustering scheme is functorial in the sense of the preceding example, we obtain the corresponding 
diagram of sets 


C(X[t u t 2 ]) 




C(X[t 0 ,h]) 


C[X[t u t 3 ]) 


This set contains information about the behavior over time of the clusters. For example, the diagram 



would correspond to a single cluster at time to, which breaks into two clusters in the interval [t*, ij]., which 
in turn merge back again in the interval [£2,^3]- 

This paper will deal with a number of methods for thinking about data using topologically inspired methods. 
We begin with a discussion of persistent homology, which is a mathematical formalism which permits us 
to infer topological information from a sample of a geometric object, and show how it can be applied to a 
particular data sets arising from natural image statistics and neuroscience. Next, we show that topological 
methods can produce a kind of imaging of data sets, not by embedding in Euclidean space but rather 
by producing a simplicial complex associated to certain initial information about the data set. We then 
demonstrate that persistence can be generalized in several different directions, providing more structure and 
information about the data sets in question. We then show that the philosophy of functoriality can be used 
to reason about the nature of clustering methods, and conclude by speculating about theorems one might 
hope to prove and discussing how the subject might develop more generally. 

The author is extremely grateful to his collaborators A. Blumberg, A. Collins, V. de Silva, L. Guibas, T. 
Ishkanov, F. Memoli, M. Nicolau, D. Ringach, G. Sapiro, H. Sexton, G. Singh, and A. Zomorodian. In 
addition, he is grateful to a number of people for very useful conversations on this subject. They include T. 
Beke, P. Diaconis, R. Ghrist, S.Holmes, K. Mischaikow, P. Niyogi, S. Oudot, S. Smale, and S. Weinberger. 
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2 Persistence and homology 


2.1 Introduction 

In thinking about qualitative properties of spaces X, an obvious one which comes to mind is its decomposition 
into path connected components. It is a partition of X, and the cardinality of the collection of blocks in this 
partition is an invariant of X, and surely deserves to be called a qualitative invariant of X. There are other 
qualitative properties to be considered. 

Example: Consider the two spaces in the image below. 


We note that both spaces are path-connected, but we can see that they are qualitatively distinct in that the 
letter “B” on the left has two essentially different loops and the “O” on the right has only one. This property 
is preserved under continuous deformations, and so if one can formalize it into a precise mathematical 
statement one can then rigorously distinguish between the two spaces. We refer to information about 
loops and higher dimensional analogues in a space as “connectivity information”. The decomposition into 
path connected components would be regarded as zeroth level connectivity information, loops as level one 
connectivity information, and so forth. The mathematical formalism which makes these notions precise is 
algebraic topology. It provides signatures which capture the intuitive notions of essential loops or essential 
higher dimensional surfaces within a space. We describe the output of the formalism. See [33] for a thorough 
treatment. We recall that two continuous maps f,g:X—>Y are said to be homotopic if there is a continuous 
map H : X x [0,1] —> Y so that H(x, 0) = f (x) and H(x, 1) = /(1). 


• Definition: For any topological space X, abelian group A, and integer k > 0, there is assigned a 
group H k (X,A). 

• Functoriality: For any A and k as above, and any continuous map / : X —> Y, there an induced 
homomorphism H k {f,A) : H k {X,A) -> H k {Y,A). One has H k {f°g,A) = H k (f,A)° H k {g,A) and 
H k (Idx ; A) = Idfi k (x,A)- These conditions are called collectively functoriality. We refer the reader to 
[44] for a treatment of categories and functors. 

• Homotopy invariance: If / and g are homotopic, then H k (f. A) = H k (g. A). It follows that if X 
and Y are homotopy equivalent, then H k (X, A) is isomorphic to H k (Y,A). 

• Normalization: Hq(*,A) = A, where * denotes the one point space. 

• Betti numbers: For any field F, H k (X,F ) will be a vector space over F. Its dimension, if it is 
finite dimensional, will be written as /3 k (X,F), and will be referred to as the k- th Betti number with 
coefficients in F. The k- th Betti number corresponds to an informal notion of number of independent 
fc-dimensional surfaces. If two spaces are homotopy equivalent, then all their Betti numbers are equal. 


Example: For any topological space X with a finite number of path components, PoiX) is the number of 
path components. 

Example: The first Betti number pi of the letter “B” above is two, and for the letter “O” it is one. In this 
case, it provides a formalization of the count of the number of loops present in the space. 
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The actual definition of homology which applies to all topological spaces (singular homology) was introduced 
in [27]. It relies on the linear algebra of infinitely generated modules over the ring Z in defining homology 
groups, and for this reason it is not useful from a computational point of view. Computations can be carried 
out by hand using a variety of techniques (long exact sequence of a pair, long exact Mayer-Vietoris sequence, 
excision theorem, spectral sequences), but direct computation from the definition is not feasible for general 
spaces. However, when one is given a space equipped with particular structures, there are often finite linear 
algebra problems which produce correct answers, i.e. answers which agree with the singular technique. A 
particularly nice example of this applies when the space in question is described as a simplicial complex. 


Definition 2.1 An abstract simplicial complex is a pair (V, E), where V is a finite set, and E is a family 
of non-empty subsets ofV such that a £ E and t C a implies that t £ E. Associated to a simplicial complex 
is a topological space \(V, E)|, which may be defined using a bijection <j> : V —* {1,2,... ,N} as the subspace 
ofR N given by the union (J<7eE c ( cr )> where c(cr) is the convex hull of the set {e^( s )} se(7 , where e* denotes 
the i — th standard basis vector. 


Intuitively, a simplicial complex structure on a space is an expression of the space as a union of points, 
intervals, triangles, and higher dimensional analogues. Simplicial complexes provide a particularly simple 
combinatorial way to describe certain topological spaces. For this reason, one often attempts to approximate 
(in various senses) topological spaces by simplicial complexes. The key point for this section, though, is 
that for simplicial complexes, the homology can be computed using only linear algebra of finitely generated 
Z-modules. We describe this in detail. Given any simplicial complex X = (V, E), we write E& for the subset 
of E consisting of all cr £ E for which #(cr) = k + 1. Elements of E fe are referred to as fc-simplices. We define 
the group of fe-chains in X as the group of formal linear combinations of elements in E*,, or equivalently the 
free abelian group on the set E*,, and denote it by Ck(X). If we impose a total order on the vertex set V, 
we define set operators d* : E*, —> Efc_i, for 0 < i < k, by letting d t (o) = o — {.s,}, where ,s t denotes the i-th 
element in cr, under the given total ordering. We now define linear operators dk : C k (X) —> Ck-i(X) by 

d k = x>i )% 

i=0 


Since the groups C k (X) are equipped with the bases E fc , these operators can be expressed as matrices D(k) 
whose columns are parametrized by E*,, whose rows are parametrized by Efc_i, and where for a G E& and 
r € Efc_i, the entry D(k) Ta is = 0 if t (f o, and = (—1)* if r C cr and if r is obtained by removing the 
i-tli member of the subset cr. The key observation is now that d k ° d k +i = 0. It follows that Image{d k + i) £ 
Kernel(dk), and that one can therefore define Hf' mp (X. Z) by 

H s k imp (X, Z)=Kernel(d k )/Image(d k+1 ) 


This basis independent version of the definition can be replaced by the result of matrix manipulations on 
the collection of matrices {D(k)}k>oF, as in [22]. The end results of these calculations are always the Smith 
normal form of various matrices constructed out of the D{kf s. It turns out that Hf' mp {X, 7L) is always 
canonically isomorphic to the singular homology of the space |X|. The conclusion is that for simplicial 
complexes, homology is algorithmically computable. 


2.2 Building coverings and complexes 


Since the homology of simplicial complexes is algorithmically computable, it is frequently desirable to con¬ 
struct simplicial complexes which compute the homology of an underlying space X, or at least has a strong 
relationship with it. One way to guarantee that the simplicial complex computes the homology of X is to 




produce a homotopy equivalence from X to the simplicial complex, or a homotopy equivalence from a space 
homotopy equivalent to X to the simplicial complex. There are a number of simplicial complexes which can 
be constructed from X together with additional data attached to X. We begin with the Cech complex. Let 
X be a topological space, and let U = {U a } ae A be any covering of X. 


Definition 2.2 The Cech complex ofU, denoted by C(U), will be the abstract simplicial complex with vertex 
set A, and where a family {aoi ■ • •, a/.} spans a k-simplex if and only if U ao n ... fl U ak ^ 0. 


This is an extremely useful construction in homotopy theory. One reason is that one has the following 
“nerve theorem” (see [5]), which provides criteria which guarantee that C(U) is homotopy equivalent to the 
underlying space X. (Recall that two spaces X and Y are said to be homotopy equivalent if there are maps 
/ : X —¥ Y and g : Y —* X so that / ° g and g ° / are homotopic to Idy and Idx respectively. A space which 
is homotopy equivalent to the one point space is called contractible.) 


Theorem 2.3 Suppose that X and U are as above, and suppose that the covering consists of open sets and 
is numerable (see [51] for a definition). Suppose further that for all 0 ^ S C A, we have that fl ses^s is 
either contractible or empty. Then C(U) is homotopy equivalent to X. 


One now needs methods for generating coverings. When the space in question is a metric space, one covering 
is given by the family B f (X) = {B e (x)} xe x, for some e > 0. More generally, for any subset V C X for 
which X = U„ey -B f (v) , one can construct the nerve of the covering {B f (v)} ve v This is a useful theoretical 
construction, in view of the following theorem. 


Theorem 2.4 Let M be a compact Riemannian manifold. Then there is a positive number e so that 
C(B e (M)) is homotopy equivalent to M whenever e < e. Moreover, for every e < e, there is a finite 
subset V C M so that the subcomplex of C(B e (M)) on the vertices in V is also homotopy equivalent to M. 


One problem with this construction is that it is computationally expensive, in that it requires storage of 
simplices of various dimensions. An idea for dealing with that problem is to construct a simplicial complex 
which can be recovered solely from the edge information. This suggests the following variant of the Cech 
construction, referred to as the Vietoris-Rips complex. 


Definition 2.5 Let X denote a metric space, with metric d. Then the Vietoris-Rips complex for X, attached 
to the parameter e, denoted by VR(X,e), will be the simplicial complex whose vertex set is X, and where 
{xq,x\, ... ,Xk} spans a k-simplex if and only if d{xi,xf) < e for all 0 < i,j < k. 

We note that the vertex sets of the two constructions are identical, so they can both be viewed as subcom¬ 
plexes of the complete simplex on the set X. The following diagram indicates the difference between the 
complexes. 
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The leftmost figure shows the covering, the middle the Cech complex, and the rightmost the Vietoris-Rips. 

The following comparison between the two complexes is easy to verify. We will see how to make use of it in 
the next section. 


Proposition 2.6 We have inclusions 

C(X,e) C VR(X,2e) C C(X, 2e) 


Even the Vietoris-Rips complex is computationally expensive, though, due to the fact that its vertex set 
consists of the entire metric space in question. A solution to this problem which has been used to study 
subspaces of Euclidean space is the Voronoi decomposition. Let X be any metric space, and let £ C X be a 
subset, called the set of landmark points. Given A e £, we define the Voronoi cell associated to A, V\, by 


V x = {x£X\d(x,X) <d(x,X')} 

for all A' € £. It is immediate that the Voronoi cells form a covering of X, and we define the Delaunay 
complex attached to £ to be the nerve of this covering. When the underlying space is Euclidean space, the 
Voronoi decomposition gives rise to an extremely useful decomposition of the space, and in favorable cases 
the Delaunay complex gives a triangulation of the convex hull of £, referred to as the Delaunay triangulation 
[21]. For submanifolds of Euclidean space, one may construct the restricted Delaunay triangulation as in [25]. 
The value of this construction is that it produces very small simplicial complexes, whose dimension is often 
equal to the dimension of the manifold under consideration. Both the the Cech and Vietoris-Rips complexes 
typically produce simplices in dimensions much higher than the dimension of the space. The definition of 
the Delaunay complex makes sense for any metric space, in particular for finite metric spaces. However, for 
finite metric spaces, it generically produces degenerate (i.e. discrete) complexes, with no one dimensional 
simplices. This is due to the fact that for finite metric spaces, it is generically the case that each value of the 
distance is taken only for one pair of points, so one does not have any points which are equidistant between 
a pair of landmarks. In order to make the method useful for finite metric spaces, we therefore modify the 
definition of the Delaunay complex to accommodate pairs of points which are “almost” (as permitted by the 
introduction of a parameter e) equidistant from a pair of landmark points. Precisely, we have the following 
definition from [8]. 

Definition 2.7 Let X be any metric space, and suppose we are given a finite set £ of points in X, called 
the landmark set, and a parameter e > 0. For every point x £ X, we let m x denote the distance from this 
point to the set £, i.e. the minimum distance from x to any point in the landmark set. Then we define 
the strong witness complex attached to this data to be the complex W S (X, £,e) whose vertex set is £, and 
where a collection {7o, ■ ■ ■ ,h} spans a k-simplex if any only if there is a point x £ X (the witness) so that 
d(x,li) < m x + e for all i. We can also consider the version of this complex in which the 1-simplices are 

identical to those ofW(X, £, e), but where the family {/o_ - Ik} spans a k-simplex if and only if all the pairs 

( li,lj ) are 1-simplices. We’ll denote this by Wy R 

There is a modified version of this construction, which is quite useful, called the weak witness construction. 
Suppose we are given a metric space X, and a set of points £ C X. Let A = {fy, ■ ■ ■ ,h} denote a finite 
subset of a metric space £. We say a point x £ X is a weak witness for A if d(x. 1) > d(x, If) for all i and all 
l f A. Given e > 0, we will also say that x is an e weak witness for A if d(x. 1) + e > d(x, If) for all i and all 
l (f A. 


Definition 2.8 Let X, £, and e be as above. We construct the weak witness complex for the given data, 
W W (X, £, e) by declaring that a family A = {/q, ..., fy} spans a k-simplex if and only if A and all its faces 







admit e weak witnesses. This complex also clearly has a version in which a k-simplex is included as a simplex 
if and only if all its 1-faces are, and we will denote this version by Wy R . 

It is clearly the case that if we have 0 < e < e', then we have an inclusion 
W s (X,C,e) •-* W s {X,C,e’) 


and similarly for Wy R ,W w , and Wy R . 

2.3 Persistent homology 

Let X be a subspace of R". Suppose further that we have a method of sampling points from X, perhaps 
with noise. By sampling with noise, we will mean that we are sampling points from a probability distribution 
concentrated near X. Let X be one sample as described above. An interesting question is to what extent it is 
possible to infer the Betti numbers of X from X. In general, of course, the answer is no. For example, it will 
clearly be necessary to assume something about the density of the sampling. Niyogi, Smale, and Weinberger 
in [52] provide sufficient hypotheses which guarantee that this is possible for Riemannian manifolds. Their 
method is to prove that the Cech complex associated to a covering by balls of a fixed radius e is homotopy 
equivalent to the underlying manifold. If one is interested in studying data from experiments, though, one 
typically cannot assume that the data lies along a submanifold. Further, even if one could assume that the 
data lies along a manifold, one is usually not in a position to verify that the stringent hypotheses of [52] are 
satisfied. The key to obtaining the desired homological information is to avoid selecting a fixed value of the 
threshhold e, and instead obtaining a useful summary of the homological information for all the different 
values of e at once. This philosophy is referred to as persistence, and was first introduced in [24]. 

We begin with the set X. It is of course a finite metric space, and we may consider the Cech complexes 
C(X, e), attached to the collection of balls of radius e with centers at the points of X. Note that if the 
centers actually he on a submanifold M C R n , and the set X is sufficiently dense in M, then this complex 
is the Cech complex attached to a covering of M. If further e is sufficiently small, then all the balls will be 
geodesically convex, and the complex will compute the homology of M correctly. This connection provides 
heuristic justification for the use of this Cech complex as a method for approximating the homology of M. 
Now recall that whenever we have e < e 7 , we have an inclusion of complexes C(X, e) C C(X, e 7 ). Consider the 
picture below, which is that of a Cech complex constructed on a finite collection of points in the Euclidean 
plane. 



We note that the main shape of the set is concentrated around a circle. However, if we compute the homology 
of this complex, it will yield a first Betti number of three, namely including the large main loop and secondly 
the two smaller loops corresponding to the two smaller holes in the complex. Intuitively, we regard these two 


holes as coming from faulty sampling or other errors in the recovery of the data. One could then argue that 
this comes from an incorrect choice of the parameter e, and that one should simply increase the parameter 
value to obtain a complex with the correct higher connectivity structure. This would give rise to the following 
picture. 



Note that while the two smaller holes have now been “filled in”, a new hole has been introduced in the 
lower right hand portion of the figure. Consequently, if we computed the homology of this complex, we 
would obtain a first Betti number of two. The result is incorrect for either of the parameter values. We now 
observe, though, that there is an inclusion of the upper complex into the lower complex, since the upper 
one corresponds to a smaller parameter value than the lower one. We can therefore ask about the image 
of the homology of the upper complex in the homology of the lower complex. The two small cycles in the 
upper complex vanish in the lower complex, since they are filled in. On the other hand, the small cycle in 
the lower complex is not in the image of the homology of the upper complex, since the required edge is not 
filled in in the upper complex. We see therefore that the image consists exactly of the larger cycle, which is 
what we regard as the “correct” answer in this case. The goal of this section is to make this observation into 
a systematic computational scheme which will provide the desired summary of the behavior of homology 
under all choices of values for the scale parameter e. 

We begin with the following definition. Again, refer to [44] for material on categories, functors, and natural 
transformations. 

Definition 2.9 Let C be any category, and V a partially ordered set. We regard V as a category V in the 
usual way, i.e. with object set V, and with a unique morphism from x to y whenever x < y. Then by a V 
persistence object in C we mean a functor : V —* C More concretely, it means a family {c x } x6 -p of objects 
of C together with morphisms <f xy : c x —* c y whenever x < y, such that <f> yz ° <j> xy = <j> xz whenever x <y < z. 
Note that the V-persistence objects in C form a category in their own right, where a morphism F from 4> 
to 'b is a natural transformation. Again, in more concrete terms, a morphism from a family {c x ,<j) xy } to a 
family {d x , ip X y} is a family of morphisms {f x }, with f x : c x —t d x , and where the diagrams 



all commute. We will denote the category of V-persistence objects in C by V pera (C). We note finally that if 
f :V —► Q is a partial order preserving map, we obtain an evident functor f* : Q per s(Q 0 —► 'Ppers(fZ) defined 
by /*(4') = T ° f, where f is f regarded as a functor V^Q. 

We let M and N denote the partially ordered sets of real number and non-negative integers, respectively. We 
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now observe that all of the constructions of the previous subsection (Cech, Vietoris-Rips, witness) yield an 
R-persistence simplicial complex attached to X. We can now construct the associated chain complexes and 
homology groups and obtain R-persistence chain complexes and R-persistence groups. What makes homology 
useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely 
generated abelian groups. If one had a classification theorem for R-persistence abelian groups, then it could 
act as a summary of the behavior of the homology of all the complexes C(X, e). However, we do not have 
such a theorem. However, it turns out that there is a classification theorem (see [64]) for a subcategory of 
the category of N-persistence F-vector spaces, where F is a field. 

To understand this classification, we observe that N-persistence abelian groups can be identified with a more 
familiar notion, namely that of a graded module over a graded ring. Let {A n } be any N-persistence abelian 
group. We will define an associated graded module 9({A n }) over the graded polynomial ring Z[t], where t is 
assigned degree 1, as follows. We set 


W) = ®4 

s>0 

where the n-th graded part is the group A n . The action of the polynomial generator t is given by 
t ■ {a n } = {/?„}, where /3 n = 

It is readily checked that 0 is a functor from N pf . r fiAb) to the category of graded Z[t]-modules, and is in fact 
an equivalence of categories, since an inverse functor can be given by M* —► {M„}, where the morphisms 
if mn are given by multiplication by t n ~ m . The conclusion is that the category N ?)ers (T6) is equivalent to 
the category of non-negatively graded modules over Z [f]. Now, there is still no classification theorem for 
graded Z[t]-modules. However, if we let F denote any field, then there is a classification theorem for finitely 
generated graded F[t]-modules. See [23] for the non-graded case. The graded case is proved in identical 
fashion. 


Theorem 2.10 Let M* denote any finitely generated non-negatively graded F[t]-module. Then there is an 
isomorphism 


M*-0F[t](f s )©0(F[t]/(^))(i,) 

where for any graded F[t]-module N *, the notation N t (s) denotes N * with an upward dimension shift of s. 
So, N*(s)i = Ni—s ■ The decomposition is unique up to permutation of factors. 

It is therefore a useful question to ask which N-persistence F-vector spaces correspond under 9 to finitely 
generated non-negatively generated F[t]-modules. We have the following. 


Proposition 2.11 We say an N -persistence F-vector space {V} n is tame if every vector space V n is finite 
dimensional, and if ip n>n +1 : V n —> V n+ i is an isomorphism for sufficiently large n. Then we have that 
9({V n } n ) is a finitely generated F[t]-module if and only if {V n } n is tame. 


We now have an easy translation of the classification result 2.10. For any 0 < m < n, we define an 
N-persistence F-vector space U(m,n) by setting = 0 for t < m and t > n, U(m,n ) = F for 

m < t < n, and b> S)t = Idp for m < s <t < n. We extend this definition to the value n = +oo in the evident 
way. 
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Proposition 2.12 Any tame N -persistence F-vector space {V n } n can be decomposed as 


N 

{K}n = 0t/K,ni) 


where each mi is a non-negative integer, and n * is a non-negative integer or +oo. The decomposition is 
unique in the sense that the collection of pairs {(rnj,rij)}j is unique up to ordering of factors. 


We can reformulate this result as follows. By a bar code, we mean a finite set of pairs (m, n), where m is a 
non-negative integer, and n is a non-negative integer or +oo. We can now restate Proposition 2.11 as the 
assertion that just as finite dimensional vector spaces are classified up to isomorphism by their dimension, 
so tame N-persistence vector spaces are classified by associated bar codes. 

Returning to the R-persistence simplicial complexes we construct, we may use any partial order preserving 
map N —> R to obtain an N-persistence simplicial complex. There are at least two useful ways to construct 
such maps. The first would be to choose a small number e, and define a map f e : N —» R by / e (n) = ne. 
A second method would be as follows. Given a finite point cloud as above, it is clear that there are only 
finitely many real values at which there are transitions in the complex. This follows from the nature of the 
conditions together with the fact that the distance function takes only finitely many values on X. Letting 
these transition values be enumerated in increasing order as {to,ti ,..., by}; we define an order preserving 
map g : N —> R by g(n) = t n for n < N, and g(n) = tiy for n > N. The first construction can be interpreted 
as sampling values of the persistence parameter from a uniform lattice. Of course, the sampling is finer 
as s decreases. The second method more efficient, since it is precisely adapted to the complex at hand. 
Furthermore, it contains complete information about the original M-vector space. 

The methodology we now use to study the homology of the complexes constructed above is now as follows. 

• Construct the R-persistence simplicial complex {C f } using Cech, Vietoris-Rips, or witness methods. 
We will denote it by d>. 

• Select a partial order preserving map / : N —> R. 

• Construct the associated N-persistence simplicial complex. 

• Construct the associated N-persistence chain complex {C*(n)}„ with coefficients in F. (It is evi¬ 
dent from the finiteness hypotheses on X and the nature of the constructions that the associated 
N-persistence F-vector spaces are tame.) 

• Compute the barcodes associated to the N-persistence vector spaces {//)(C» (»))}„. 

The last step turns out be tractable due to the translation into commutative algebraic terms above. We 
recall (see [22]) that homology computation can be performed by putting a matrix in Smith normal form. 
The algorithms for Smith normal form are typically applied for matrices over Z, but they are applicable in 
the context of any principal ideal domain, of which F[t] is one. The fact that the ring is graded and the 
boundary matrices are homogeneous makes the application simpler. This is the observation made in [64], 
where the algorithm for computing persistent homology which we use is described in detail. 

In interpreting the output, one now finds that long intervals in the output barcode indicates the presence of 
a homology class which “persists” over a long range of parameter values, while short intervals indicate cycles 
which are “born” at a given parameter value and then “die” at a nearby parameter value. The pictures and 
discussion above allow us to formulate the intuition that long intervals correspond to large scale geometric 
features in the space, and short intervals correspond to noise or inadequate sampling. Of course, what is 
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short and what is long is very problem dependent. Also, in some cases, it may be a false dichotomy, and 
the more useful point of view is that the barcodes represents the space at various scales, and the whole 
multiscale version of the space may actually be of interest. 


2.4 Example: Natural image statistics 

Images taken with a digital camera can be viewed as vectors in a very high dimensional vector space. Each 
image consists of a number (the gray scale value) attached to each of a large number of pixels, and therefore 
we may think of the image as lying in M p , where P is the number of pixels used by the camera. From this 
point of view, one can ask questions about the nature of the collection of all possible images lying within 
R p . For example, can it be modeled as a submanifold or subspace of M p ? If it were, one could conclude 
on the one hand that it is very high dimensional, since images are capable of expressing such a wide variety 
of scenes, and on the other hand that it would be a manifold of very high codimension, since random pixel 
arrays will very rarely approximate an image. David Mumford gave a great deal of thought to questions like 
this one concerning natural image statistics, and came to the conclusion that although the above argument 
indicates that whole manifold of images is not accessible in a useful way, a space of small image patches 
might in fact contain quite useful information. In [41], A. Lee, D. Mumford, and K. Pedersen performed 
an analysis constructed in this way, and we will summarize the results of that paper. They began with a 
database of black and white images taken by J. van Hateren and A. van der Schaaf in [34]. The database 
consisted of images taken around Groningen, Holland, in town and in the surrounding countryside. Within 
such an image, one can consider 3x3 patches, i.e. square arrays of 9 pixels. 



Each such patch can now be regarded as a 9-tuple of real numbers (the gray scale values again), i.e. a 
vector in K 9 . A preliminary observation is that patches which are constant, or rather nearly constant, 
will predominate among these patches. The reason is that most images have large solid regions, where the 
gray scale intensity does not change significantly, and these regions will contribute more to the collection 
of patches than the patches in which some transitions are occurring. These nearly constant patches will be 
referred to as low contrast. Of course, the low contrast patches do not carry interesting structure, so Lee, 
Mumford, and Pedersen proceeded as follows. They first define the D-norm of a 3 X 3 image patch, as a 
certain quadratic function of the logs of the gray scale values. It is a way of defining the contrast of an image 
patch. Then they select 5,000 patches at random from each of the images from [34], and select the top 20% 
as evaluated by the D-norm. This will now constitute a database of high contrast patches from the patches 
occurring in the image database from [34]. They then perform two transformations on the data, as follows. 

1. Mean center the data. The mean intensity value over all nine pixels is subtracted from each pixel 
value, to obtain a patch with mean zero. This means that if a patch is obtained from another patch by 
adding a constant value, i.e. “turning up the brightness knob”, then the two patches will be regarded 
as the same. Note that this transformation puts all points in an 8-dimensional subspace within M 9 . 

2. Normalize the D-norm. Since all the patches chosen will have D-norm bounded away from zero, one 
can divide by it to obtain a patch with D-norm = 1. This means that if one patch is obtained from 
another by “turning the contrast knob”, then the two patches will be regarded as identical. 
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The result of this construction is a database M of c:a 4.5 x 10 6 points on a 7-dimensional ellipsoid in M 8 . 
The goal of the paper [41] is now to obtain some understanding of how this set sits within S' 7 , and what can 
be said about the patches which do appear. A first observation is that the points are scattered throughout 
the 7-sphere, in the sense that no point on S 7 is very far from the set, but that the density appears to vary a 
great deal. In particular, in [41] indications were found that the data was concentrated around an annulus. 
In [11], a systematic study of the topology of the high density portion was carried out, and this work is what 
we will describe in the remainder of this section. 

The first issue to be addressed is what is meant by “high density” portion. Density estimation is a highly 
developed area within statistics (see for example [58]). We selected a very crude proxy for density, in the 
interest of minimizing the computational burden. It is defined as follows. Fix a positive integer k, and define 
the k-codensity function 5k of x £ X, where X is a set of point cloud data, by 


5 k (x) = d(x,v k (x)) 

where d denotes the distance function in X, and where z'fc(x) denotes the k-th nearest neighbor of x in X. 
Note that 5k(—) is inversely related with density, since a concentrated region will have smaller distances to 
the fc-tli nearest neighbor, so we will be studying subcollections of points for which 5k(~) is bounded from 
above by a threshhold. Secondly, we also note that each 6k yields a different density estimator. In rough 
terms, 5k for large values of k computes density using points in large neighborhoods of x, and for small 
values uses small neighborhoods. So, 5k for large k corresponds to a smoothed out notion of density, and for 
small k corresponds to a version which carries more of the detailed structure of the data set. 

For any subset Mo C M, we now define subsets Mo\k,T] C Mo, where A; is a positive integer, and T is a 
percentage value, by 

Mo[k,T\ = {x e Mo\6k(x) lies among the T% lowest values of 5k in Mo} 

The goal of the paper [11] is to infer the topology of a putative space underlying the various sets of points 
M[k,T\. In some cases we will approximate this via a subset Ma\t,T\. It is fairly direct to see that the 
variable k scales with the size of the set, so that if p= #(M)/#(Mo), then Mo[k,T] is comparable with 
M[pk,T\. We do this by obtaining the data sets Mo[k,T], then selecting a set of landmark points, and 
finding various barcodes attached to witness complexes associated to the space. Below is the barcode for 
i7i(WAfo[300,30])), where Mo is a sample of 5 x 10 4 points from M, and W denotes a witness complex 
constructed with 50 landmark points. 
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k = 300, T = 30% 


We note that there are a number of short lines, and one long one. According to the philosophy mentioned 
above, this suggests that the first Betti number should be estimated to be one. The barcode is stable, in the 
sense that it appears repeatedly when the set of landmark points is varied, and when the sample from the 
full data set is varied. Therefore, the simplest possible explanation for this barcode is that the underlying 
space should be a circle. One can then ask if there is a simple explanation involving the data which would 
yield a circle as the underlying space. The picture below gives such an explanation. 



Primary circle 


More formally, the picture suggests the explanation that the most frequently occurring patches are those 
approximating two variable functions which depend only on a linear projection from the two variable space, 
and so that the function is increasing in that linear projection. This explanation is consistent with an annulus 
conjectured to represent the densest patches in [41]. 

The next picture is a barcode for H\ (lTAfo[15,30])), where Mo is as above, again with 50 landmark points 
chosen. 
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fc = 15,T = 30% 


We note that in this case, there are many short segments and 5 longer segments. It is again the case that 
this barcode is stable over the choices of landmark points and samples from M . This suggests that the first 
Betti number of the putative underlying space should be five. It now becomes more difficult to identify the 
simplest explanation for this result, and a number of such models are possible. The one which ultimately 
turns out to fit the data is pictured as follows. 



Three circle model 


This picture is composed of a primary circle, pictured in black, and two secondary circles, pictured in green 
and red. Although it is perhaps not clear from the image, the intent is that the secondary circles each 
intersect the primary circle in two points, and do not intersect each other. A space constructed this way 
can readily be seen to have a first Betti number of five. An explanation for this geometric object in terms 
of image patches is now the following. 



Three circle model in the data 


The secondary circles interpolate between functions which are an increasing function of a linear projection 
to functions which are “bump functions” with an internal local maximum evaluated on the same linear 
projection. Note that the two secondary circles each intersect the primary circle in two points, as indicated 
by the coloring, and that they do not intersect each other. We informally confirmed that the indicated 
patches are the ones which occur in the high density portions of the data set. 


16 






Remark: We note that there is a preference within the data for patches which are aligned in vertical and 
horizontal directions, since one can construct versions of the secondary circles which are not aligned in the 
vertical direction, and they do not appear. One explanation for this is that patches in natural images are 
biased in favor of the horizontal and vertical directions because nature has this bias, since for example objects 
aligned in a vertical direction are more stable than those aligned at a 45 degree angle. Another explanation 
is that this phenomenon is related to the technology of the camera, since the rectangular pixel arrays in the 
camera have the potential to bias the patches in favor of the vertical and horizontal directions. We believe 
that both factors are involved. In [13], we have studied 5x5 patches, and found the three circles model 
appearing there as well. In that case, one would expect to see less bias in favor of the vertical and horizontal 
directions, since the pixels give a finer sampling of the image. 

One could now ask if there is a larger 2-dimensional space containing the three circle model, occurring 
with substantial density. We will first ask to find a natural embedding of the theoretical three circle model 
in a 2-manifold. It turns out that the model embeds naturally in a Klein bottle (Image courtesy of Tom 
Banchoff). 



To see this, we first recall that the Klein bottle can be described as an identification space as pictured below. 



The colored arrows indicate points being identified using the quotient topology construction (see [33]), which 
informally means that the top vertical edge is identified with the lower vertical edge in a way which preserves 
the ^-coordinate, and the right hand vertical edge is identified with the left hand vertical edge after a twist 
which changes the orientation of the edge. It is convenient to represent the Klein bottle in this way, since it 
does not embed in Euclidean 3-space, and therefore cannot be precisely visualized, although a useful visual 
representation including self intersections was shown above. 

We are interested in finding a sensible embedding of the three circle model in the Klein bottle. Some 
experimentation with the three circle model results in the following picture 
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in which the red segments form the primary circle and the yellow and green segments form the secondary 
circles. The red segments form a single circle since the intersection point of the lower red segment with 
the right (respectively left) vertical edge is identified with the intersection point of the upper red segment 
with the left (respectively right) vertical edge, and the yellow and green segments form circles since the 
intersection of the yellow (respectively green) segment with the upper horizontal edge is identified with the 
intersection of the yellow (respectively green) segment with the lower horizontal edge. We note that the 
yellow and green circles intersect the primary circle in two points, and do not intersect each other, so the 
picture in question produces a natural embedding of the three circle model in the Klein bottle. 

Remark: The selection of the Klein bottle was the result of a great deal of mental experimentation with 
various candidate 2-manifolds, in which we were unable to find similar natural embeddings in other candidate 
manifolds, such as the torus or projective plane. 

In [11], it was demonstrated that the Klein bottle effectively models a space of high contrast patches of high 
density. To understand the results of that paper, it is necessary to discuss another theoretical version of the 
Klein bottle. We will be regarding the 3x3 patches as obtained by sampling a smooth real valued function 
on the unit disc at nine grid points, and study subspaces of the space of all such functions which have a 
rough correspondence with the subspaces of the space of patches we study. We will consider the space Q of 
all two variable polynomials of degree 2, i.e. functions 


f(x, y) = A+ Bx + Cy+ Dx 2 + Exy + Fy 2 

The set Q is a 6 dimensional real vector space. We now consider the subspace PCg consisting of functions 
/ so that 

/ / = 0 and [ f 2 = 1 (2-1) 

Jd Jd 

The first condition is the analogue of the mean centering condition imposed on the patches, and the second 
is the analogue of the normalization condition for the contrast. Imposing only the first condition, which is 
linear, produces a 5 dimensional vector subspace, and the second, which is quadratic in character, produces 
a 4 dimensional ellipsoid within this vector space. We now consider the subspace Vo C V, consisting of all 
functions within V which are of the form 


f(x, y) = q(Xx + yy) 

where q is a single variable quadratic function, and where X 2 + y 2 = 1. The space of all functions of this 
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form within Q is four dimensional (three variables parametrize q, and (A, fi) lies on the (one-dimensional) 
unit circle). The two additional constraints in 2-1 above imposed on it now yield the 2-dimensional complex 
Vo- We claim that Vo is homeomorphic the Klein bottle. To see this, we let A denote the space of single 
variable polynomials q(t) = c 0 + C\t + c 2 t 2 satisfying the two conditions 


J q(t ) = 0 and J q(t) 2 = 1 

It is easy to check that regarded as a subspace of R 3 , this subspace is an ellipse and therefore is homeomorphic 
to a circle. For any unit vector v in M 2 and any q £ A, we let : R 2 —> R be defined by qv(w) = q(v ■ w). 
It is easy to check that for q e A and v a unit vector, we have that 


and that therefore the formula 


and 


/ «I/o 

Jd 




Qv 



defines a continuous map 6 from Ax S 1 to Vo- The map 0 is however not a homeomorphism, which one can 
check as follows. Let p : A —> A be the involution defined by p(co + C\t + c 2 i 2 ) = Co — cy t + c 2 t 2 . Then we 
have the relation 


6(q,v) = 0(p(q), —v) 

so that the map 9 factors through the space of orbits under the involution. It is easy to check (a) that the 
factorization is a homeomorphism and (b) that the orbit space is homeomorphic to a Klein bottle. 

We now ask to what extent we can “see” a Klein bottle in the data. A naive approach to this question would 
be to simply perform the experiments we did above for k = 15 but with a less stringent density threshhold, 
for example 40 or 50. Performing these experiments do not produce a non-trivial fh- One might suppose, 
though, that the set Mo is not large enough to provide sufficient resolution in the density estimation, and 
that if one uses the full set M that one might obtain different answers. With this larger set, one might also 
vary the estimator parameter k to obtain a finer estimation of density. After some experimentation, one can 
construct a data set S by sampling 10 4 points from the data set M [100,10]. The set S exhibits the three 
circle model clearly, but enlarging it still does not exhibit the non-trivial /? 2 which would be characteristic 
of the Klein bottle. In order to begin to understand the situation, one should then ask what the least 
frequently appearing patches on the Klein bottle would be. In thinking about the polynomial model, one 
expects that there should be a preference for the linear polynomials, and the experience with the three circle 
model suggests that there is an additional preference for the patches which are lined up with the vertical 
directions. This suggests that the least frequently appearing patches would be the pure quadratics composed 
with the linear functions (x,y) —► ^-x + and (x, y) —> Slightly more frequently appearing 

would be pure quadratics composed with any linear function which is not a projection on the x and y-axes. 
This set of pure quadratics forms a pair of open one-dimensional arcs within the Klein bottle. These arcs 
are indicated in blue in the image below, where as before the red circle arcs form the primary circle, and the 
yellow and green arcs the secondary circles. 
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The idea will be that we wish to include some points corresponding to the blue arcs to the data set, but 
which turn out not to satisfy the density threshhold. We will find that once these points are added, the 
set then carries the topology of a Klein bottle. This would then give a reasonable qualitative description 
for a portion of the density distribution of the high contrast 3x3 image patches, in that it can be said 
to concentrate around a Klein bottle, but with strongly reduced density around the non-vertical and non¬ 
horizontal pure quadratic patches. In order to find such a map, we identify the pixels in the 3x3 array with 
the points in the set L = {—1,0,1} x {—1,0,1} in the Euclidean plane. For any given function f G V o, we 
define the associated patch by evaluating / at the nine points of L, and then mean centering and D-norm 
normalizing. This produces a map h from the Klein bottle Vo to the normalized patch space. We then obtain 
additional points to add to the data set by selecting 30 points at random from the blue arcs in the Klein 
bottle, computing h on them, and then for each of the 30 points selecting the points of M. [100,10] which 
he closest to them, and finally adjoining them to the set S to obtain an enlarged set S'. Witness complexes 
with 50 landmark points computed for S' now display the barcodes which would be associated to a Klein 
bottle. Here is a typical picture. 



We note that the /fy barcode (the upper picture) clearly shows the single component, the shows two lines 
from threshhold parameter value .15 to .35, and finally the (3 2 barcode shows a single line on roughly the 
same interval. This gives /3q = 1, = 2, and fh = 1 ■ These are the mod 2 Betti numbers for the Klein 

bottle. 

Remark: There are actually two two-dimensional manifolds with these mod 2 Betti numbers, one is the 
Klein bottle, and the other is a torus. These two are distinguished by mod 3 homology, and we have 
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performed the computation to show that the mod 3 homology is consistent with the Klein bottle and not 
with the torus. 

Remark: One can also ask if the space we are studying is in fact closely related the theoretical Klein bottle 
defined above using quadratic polynomials. That it is so is strongly suggested in [11] by a comparison with 
data sets constructed by adjoining additional points obtained from the whole theoretical Klein bottle to the 
set S. The resulting space also shows a strong indication of the same Betti numbers, indicating that the 
spaces represent essentially the same phenomenon. 


2.5 Example: Electrode array data from primary visual cortex 

The goal of neuroscience is, of course, to obtain as complete as possible an understanding of how the nervous 
system operates in performing all its tasks, including vision, motor control, higher level cognition, olfactory 
sensation, etc.. One aspect of this kind of understanding is the analysis of the structure and function of 
individual neurons, and the creation of an associated taxonomy of individual neurons. Another aspect is the 
analysis of how families of neurons cooperate to accomplish various tasks, which could be referred to as the 
study of populations of neurons. The second problem appears to be very amenable to geometric analysis, 
since it will involve the activities of several neurons at once. In the paper [59], a first attempt at topological 
analysis of data sets constructed out of the simultaneous activity of several neurons is carried out, with 
encouraging results, and we will describe the results of that paper in this section. 

The arrays of neurons studied in [59] are from the primary visual cortex or VI in Macaque monkeys. The 
primary cortex is a component in the visual pathway, which begins with the retinal cells in the eye, proceeds 
through the lateral geniculate locus, then to the primary visual cortex, and then through a number of higher 
level processing units, such as V2, V3, V4, middle temporal area (MT), and others. See [36] and [63] for useful 
discussions. It is known that VI performs low level tasks, such as edge and line detection, and its output is 
then processed for higher level and larger scale properties further along the visual pathway. However, the 
mechanism by which it carries out these tasks is not understood. A very interesting series of experiments 
were conducted in the papers [62] and [38]. These authors study the behavior of the VI in Macaque monkeys 
by injecting a voltage sensitive dye in it, and then performing optical imaging of small regions of the cortex. 
Voltage changes in this portion of the cortex will then give rise to color differences in the imaging. Since 
the voltages change over time, so will the optical images. These papers study the behavior of the optical 
images under two separate conditions, one the evoked state, in which stimuli are being supplied to the eye of 
the monkey, and the unevoked or spontaneous state, in which no stimulus is being supplied. It was observed 
in [38] that in an informal sense, the images in the different conditions appeared to be quite similar, and a 
statistical analysis strongly suggested that the behavior of VI in the spontaneous condition was consistent 
with a behavior which consisted in moving through a family of evoked images corresponding to responses to 
angular boundaries, without any particular order. 

Another method for studying the behavior of neurons in VI and other parts of the nervous system is the 
method of embedded electrode arrays. In this case, arrays of up to c:a 100 regularly spaced electrodes are 
implanted in the VI (or whatever other portion of the nervous system one is studying) . The voltage at the 
electrodes are then recorded simultaneously, so one obtains a voltage level at each of the electrodes at each 
point in time. Sophisticated signal processing techniques are then used to obtain an array of N (where N 
is the number of electrodes) spike trains , i.e. lists of firing times for N neurons. This experimental setup 
provides another view into the behavior of the neurons in VI, and the idea of the paper [59] was to attempt 
to the replicate the results of [38], which were carried out using the voltage sensitive dye technology, in 
the embedded electrode setting and to attempt to refine the results presented there. We now describe the 
experiments which were carried out, and the results of our analysis of the data obtained from them. 

10 x 10 electrode arrays were used in recording output from the VI in Macaque monkeys who view a screen. 
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Two segments of recording, each of roughly 20-30 minutes, were done under two separate conditions. In 
the first, the eyes of the animal were occluded, so no stimulus was presented to the visual system. In the 
second, a video sequence obtained by sampling from different movie clips. We refer to the data obtained 
in the first setting as spontaneous, and in the second setting as evoked. A signal processing methodology 
called spike sorting was then applied to the data, so that one could identify neurons and firing times for each 
neuron. Next, the data was broken up into ten second segments in both cases. Each such segment was next 
divided up into 200 50ms bins, and for each neuron one is able to count the number of firings within each 
such 50ms bin. The five neurons with the highest firing rate were selected in each 10 second window, and for 
each bin one can now obtain the 5-vector of number of firings of each of these five neurons. By performing 
this construction over all 200 bins, one obtains a set of point cloud data consisting of 200 points in M 5 . Of 
course, we have many such data sets, coming from different choice of 10 second segments and from different 
choice of “regime” (spontaneous or evoked). 

Beginning with these point clouds, witness complexes based on 35 landmark points were constructed. The 
landmark sets were constructed by the “maxmin” procedure, a procedure designed to ensure that the land¬ 
mark points are well distributed throughout the point cloud. This procedure begins with a seed point, and 
then constructs the rest of the points deterministically from it. For each data set, we constructed witness 
complexes from all the possible seed points. In order to derive topological signatures from each such witness 
complex, one selects a threshhold as a fraction of the covering radius of the point cloud, and then determines 
the Betti numbers (3$, 0i, and 02 of the witness complex with this given threshhold value of e. Thus, for each 
witness complex, one can now obtain a vector or signature of integers (0o, 0i, 02)- The observed signatures 
are listed below, with pictures of simple models of possible geometries which they represent. 


o® © ooo 
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o 

1110 I 1131 I 1121 I I 111 1 


1112 I 1123 I 1101 I 1122 | 


By far the most frequently occurring signatures were (1,1,0) and (1,0,1), corresponding to a circle and a 
sphere, respectively. The picture below shows the distribution of occurrences of these two under various 
choices of the threshholds. 
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Spontaneous Natural image sequences 




In order to validate the significance of the results, we ran the identical procedures with data generated at 
random for firings with a Poisson model. Monte Carlo simulation show that the probability of obtaining 
segments for /?i or /3 2 longer than 30% of the diameter of the point cloud is < .005. 

The summary of the results of the experiment are that topological methods clearly distinguish the data 
in both regimes from a Poisson null hypothesis model. They also suggest that there is a similarity in 
the spontaneous and evoked regimes, since the same topological signatures occur. Further, though, the 
statistics of the signatures occurring are also able to distinguish between the two regimes. We do not yet 
understand the nature of the topological phenomenon, which is something which should be addressed by 
mapping algorithms, perhaps along the lines of the following section. One aspect we have addressed in this 
direction, though, is the question of whether a simple periodic phenomenon associated the body’s natural 
rhythms are responsible for the topology. Such a phenomenon would likely create peaks in the amplitude 
spectrum of the segments of the data we study. No statistically significant peaks of this type were observed. 


3 Imaging: Mapper 

3.1 Visualization 

So far we have discussed the attachment of homological signatures to point clouds in an attempt to obtain 
geometric understanding of them. Frequently, though, it is possible to find images of various kinds attached 
to point cloud data which allow one to obtain a qualitative understanding of them through direct visualizaton. 
One such method is the projection pursuit method (see [37]), which uses a statistical measure of information 
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contained in a linear projection to select a particularly good linear projection for data which is embedded 
in Euclidean space. Another method is multidimensional scaling, (see [1]), which begins from an arbitrary 
point cloud and attempts to embed it isometrically in Euclidean spaces of various dimensions with minimum 
distortion of the metric. Related developments are the Isomap (see [61]) and locally linear embedding (see 
[56]). In all cases, the methodologies result in a point cloud in R 2 or R 3 , which can then be visualized by 
the investigator. There are, however, other possible avenues for visualization and qualitative representation 
of geometric objects. One such possibility is representation as a graph or as a higher dimensional simplicial 
complex. Such combinatorial representations can lead to useful qualitative understanding in their own right, 
but graph visualization software such as Graphviz (available at http://www.graphviz.org/) can provide useful 
visualizations. In thinking about how to develop such a representation, it is useful to keep in mind what 
characteristics would be desirable. Here is a list of some such properties. 


• Insensitivity to metric: As mentioned in the introduction, metrics used in analyzing many modern 
data sets are not derived from a particularly refined theory, but instead are constructed as a reasonable 
quantitative proxy for an intuitive notion of similarity. Therefore, imaging methods should be relatively 
insensitive to detailed quantitative changes. 

• Understanding sensitivity to parameter changes: Many algorithms require parameters to be set 
before an outcome is obtained. Since setting such parameters often involves arbitrary, it is desirable to 
use methods which provide useful summaries of the behavior under all choices of parameters if possible. 

• Multiscale representations: It is desirable to understand sets of point cloud at various levels of 
resolution, and to be able to provide outputs at different levels for comparison. Features which are 
seen at multiple scales will be viewed as more likely to be actual features as opposed to more transient 
features which could be viewed as artifacts of the imaging method. 


The rest of this section will be devoted to the description of a method which addresses each of these points. 
We have named it Mapper, and it is described in detail in [60]. 


3.2 A topological method 

We begin with a topological construction based on a covering of a topological space X. 

Definition 3.1 Let X be a topological space, and let U = {U a } ae A be a finite covering of the space X (so the 
set A is finite). Let A [A] denote the standard simplex with vertex set A, so dim(A[A]) = #(A) — 1. Further, 
for any non-empty subset S C A, we let A[,S] C A [A] denote the face spanned by the vertices corresponding 
to elements of S, and we let X [5] = f) seS Us Q X. By the Mayer-Vietoris blowup of X associated to U, 
denoted by M.(X,U), we mean the subspace 

(J A [S] x X[S] C A [A] x X 

We note that there are natural projection maps / : M.(X,U) —* X and g : M.(X,U) —> A [A], which have 
the following properties. 


The map / is a homotopy equivalence when X has the homotopy type of a finite complex and the 
covering consists of open sets. In fact, using a partition of unity subordinate to the covering U, one 
can obtain an explicit homotopy inverse ip : X —> M.(X,U). 
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• The map g is a homotopy equivalence onto its image (which is the nerve of the covering, or the Cech 
complex C{U)) when all the sets X[S'] are either empty or contractible. This is how the nerve theorem 
(Theorem 2.3) is proved. 


These two observations demonstrate that one obtains a map from X to C(U) for any finite complex X. 
Such a map can be viewed as a kind of coordinatization of the space X. Ordinary coordinatizations provide 
maps to Euclidean spaces of various dimensions, and they often provide useful insights into the spaces in 
question. Simplicial complexes, particularly low dimensional ones, can also often be readily visualized, and 
can therefore also be expected to provide useful information about a space. This is so even if the map is 
not a homeomorphism, so it does not provide complete information about a space. We will next observe 
that there is a variant of the C{U) construction which is a somewhat more sensitive target for this kind of 
coordinatization map. Let X be any topological space, and let U be any covering of X. We will now define a 
simplicial complex C n ° (U) to be the nerve of the covering of X by sets which are path connected components 
of a set of the form U a , so the covering is indexed by the set of pairs {(a, £)}, where £ is a path component 
of U a . The set map (a, £) —> a yields a map of simplicial complexes C m,1 {U) —► C(U). It is further clear that 
we have a map M(X,U) —► C n ° (U ), which is induced by the projection U a —> no(U a ) for each a. Finally, 
the composite 


M{X,U) &»(U) -h. C{U) 


is the earlier defined map g. 

Example: Let X denote the unit circle, and let a covering U of X be given by the three sets A = {(x, y)\y < 
0}, B = {(x,y)\y > 0}, and C = {(x, y)\y ^ ±1}. We note that no(A) and tto(-B) consist of a single point, 
and tto(C) consists of two points. 

o 

The simplicial complexes C ,7r ° (U ) and C(U) are now given by the picture 



Note that C n °(U) is actually homeomorphic to X, while C(U) is not. This is an example of the fact that 
C 7r ° is more sensitive that C. 

In order for this construction to be useful, one must develop methods for constructing coverings of topological 
spaces. Earlier we have looked at constructions where the sets are open balls in a metric space, and where 
we have versions of the Voronoi decomposition adapted to general metric spaces. We now suppose that we 
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are given a reference map p from our space X to a metric space Z , and further that we are given a finite 
open or closed covering U of Z. Then we may consider the covering p*U given by p*U = {p~ [ U a } ae A, when 
U = {U a }aeA- This is clearly a covering of X. Typical examples of useful metric spaces Z include R, R", 
and S 1 . These spaces admit natural coverings. 

Example: For X = R, let R and e be positive real numbers. Then we may construct the covering U[R, e] 
which consists of all intervals of the form [kR — e, (k + 1)R, + e]. This is a two parameter family of coverings, 
and as long as e < f, it has covering dimension 1, in the sense that no non-trivial threefold overlaps are 
non-empty. Products of these intervals would give a corresponding covering of I™. 

Example: Let X = S 1 ,N be an integer > 2, and e > 0 be a real number. Then we can form a covering 
U[N,e\ = {U.j}o<j < N of X by setting 


Uj = {{cos{x),sin{x))\x e [— - e, + e]} 


whenever e > -j^. 

Remark: When the reference space is R, our construction is closely related to the Reel) graph of a real 
valued function on a manifold (see [55]). The actual Reeb graph should be viewed as a limiting version of 
the construction as one studies the coverings U[R, e] with R and e tending to zero. 

We must now describe a method for transporting this construction from the setting of topological spaces 
to the setting of point clouds. The notion of a covering makes sense in the point cloud setting, as does 
the definition of coverings of point clouds using maps from the point cloud to a reference metric space, by 
“pulling back” a predefined covering of the reference space. The notion which does not make immediate 
sense is the notion of 7To, i.e. constructing connected components of a point cloud. The notion of clustering 
(see [31]) turns out to be the appropriate analogue. Our main example of such a clustering algorithm will be 
the so-called single linkage clustering. It is defined by fixing the value of a parameter e, and defining blocks 
of a partition of our point cloud as the set of equivalence classes under the equivalence relation generated 
by the relation defined by x ~ e x' if and only if d(x. x') < e. Note that the set of clusters in this setting 
is precisely 7ro applied to the Vietoris-Rips complex VR(X, e), and that each “cluster” corresponds to the 
set of vertices in a single connected component. Now, our version of the construction C' 7r ° in this context is 
obtained as follows. 

1. Define a reference map / : X —> Z, where X is the given point cloud and Z is the reference metric 
space. 

2. Select a covering U of Z. \i Z = R, then U can be obtained by selecting R and e as above, and 
constructing the covering U[R, e], 

3. If U = {U a } a( zA, then construct the subsets X a = f~ l U a . 

4. Select a value e as input to the single linkage clustering algorithm above, and construct the set of 
clusters obtained by applying the single linkage algorithm with parameter value e to the sets X a . At 
this point, we have a covering of X parametrized by pairs (a, c), where a £ A and c is one of the 
clusters of X a . 

5. Construct the simplicial complex whose vertex set is the set of all possible such pairs ( a , c), and where 
a family {(oq, Co), (aq, Ci),..., (a*, c*,)} spans a fc-simplex if and only if the corresponding clusters have 
a point in common. 

This construction is a plausible analogue of the continuous construction described above. We note that it 
depends on the reference map, a covering of the reference space, and a value for e. We observe that in 


26 





fact any clustering algorithm could be used to cluster the sets X a , and one could still obtain a sensible 
construction. We note that if the covering U. has covering dimension < d, i.e. if whenever we are given a 
family {ao, aq,.. . , a^} of distinct elements of a with t > d, then U ao n . . . fl U at = 0, then the dimension of 
the simplicial complex we construct will be < d as well. This follows immediately from the definitions. 

We note that this construction readily produces multiresolution or multiscale structure which allows one to 
distinguish actual features from artifacts. To see this, we begin with the definition of a map of coverings. 
Let U = {U a } ae A and V = {Values be coverings of a space Z. By a map of coverings from U to V we will 
mean a set map 9 : A —> B so that for all a £ A, we have U a C 14(a)- 

Example: Consider the coverings lA[R,e\ of R defined above. The indexing set in this case consists of the 
integers. It is clear from the definition that the identity map from Z to itself yields a map of coverings 
U[R, e] —> IA[R, e'] whenever e < e!. In this case, the map of coverings consists simply of the inclusion of an 
interval into an interval with the same center but with larger diameter. 

Example: The map of integers k —> |_§J defines a map of coverings U[R,e] —> IA[2R,e], which is two to 
one in the sense that every interval in U [2R, e] contains two intervals from U. In order to use these maps to 
obtain a multiresolution version of the Mapper construction, we need a definition. 


Definition 3.2 A clustering algorithm is said to be functorial if whenever one has an inclusion X —> Y 
of point clouds, i.e. a set map preserving distances, then the image of each cluster constructed in X under 
f is included in one of the clusters in Y. It follows from the fact that the clustering algorithm produces a 
partition of the point cloud in question that each cluster is contained in a unique cluster, and therefore that 
we have an induced map of sets from the clusters in X to the clusters in Y. 


Now suppose we are given data for applying Mapper, i.e. a point cloud X together with a reference map p 
to a metric space Z. Suppose further that we are given two coverings U = {U 0 } ae A and V = {Vp}p e B of Z, 
and a map of coverings 9 : A —> B. Since we have a map of coverings, it is clear that we obtain inclusions 
p~ 1 U a Q p~ x Veto) for all a & A. If we apply a functorial clustering scheme to each of the sets p~ x U a and 
p~ 1 V 3, it is clear from the definition that we will obtain a map from the set of clusters obtained by applying 
the clustering algorithm to p~ 1 U a to the set of clusters obtained by applying it to p~ 1 Vp, and therefore a 
map from the vertex set of M(X,IA) to the vertex set of M(X,V). One readily checks that it is actually a 
simplicial map, so we obtain an associated simplicial map © : M(X,IA) —* M(X, V). So now, for example, 
we will always obtain a diagram of simplicial complexes 


-♦ M(X,IA[R/ 4,e]) M{X,U{R/ 2,e]) -► M(X,lA[R,e}) 

As one moves to the left, the coverings of M (and therefore of X) become more refined, and are presumed to 
give picture with finer resolution of the space in question. Studying the behavior of features under such maps 
will allow one to get a sense of which observed features are real geometric features of the point cloud, and 
which are artifacts, since the intuition is that features which appear at several levels in such a multiresolution 
diagram would be more intrinsic to the data set than those which appear at a single level. 


3.3 Filters 

An important question, of course, is how to generate useful reference maps p. If our reference space Z is 
actually R", then this means simply generating real valued functions on the point cloud. To emphasize the 
way in which these functions are being used, we refer to them as filters. Frequently one has interesting 
filters, defined by a user, which one wants to study. However, in other cases one simply wants to obtain a 
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geometric picture of the point cloud, and it is important to generate filters directly from the metric which 
reflect interesting properties of the point cloud. Here are some important examples. 


• (Density): Consider any density estimator applied a point cloud X. It will produce a non-negative 
function on X, which reflects useful information about the data set. Often, it is exactly the nature of 
this function which is of interest. 

• (Data depth): The notion of data depth refers to any attempt to quantify the notion of nearness 
to the center of a data set. It does not necessarily require the existence of an actual center in any 
particular sense, although a point which minimizes the quantity in question could perhaps be thought 
of as a choice for a center. In our group’s work, we have referred to quantities of the form 


e P (*) = ^X) ^ d ( x ’ x ') P 

(with an obvious generalization to p = oo) as eccentricity functions, and have used them as filters. 
Other notions could equally well be used. The main point is that Mapper output based on such 
functions can identify qualitative structure of a particular kind. For example, if the space were as 
pictured below, 



then Mapper would recover the structure of the three flares coming out from the central point. 

(Eigenvectors of graph Laplacians): Graph Laplacians are interesting linear operators attached 
to graphs (see [40]). In particular, their eigenfunctions produce functions on the vertex set of the 
graph. They can be used, for example, to produce cluster decompositions of data sets when the graph 
is the 1-skeleton of a Vietoris-Rips complex. We find that these eigenfunctions (again applied to the 
1-skeleton of the Vietoris-Rips complex of a point cloud) also can produce useful filters in Mapper 
analysis of data sets. 


3.4 Scale space 

The construction from the previous subsection depends on certain inputs, including a parameter e. The 
decision of how to choose this parameter is in principle a difficult one, for which one has little guidance. 
Further, it may often be desirable to broaden the definition of the complex to permit choices of e which 
vary with a, i.e. over the reference space Z. In this section, we discuss a systematic way of considering 
such varying choices of “scale”. We first note that the because the single linkage procedure applied to a 
point cloud X can be interpreted as computing connected components of VR(X, e), the persistence barcode 
for /?o yields interesting information about the behavior of the components (or clusters) for all values of e. 
To be explicit about this, we consider the subset E(X) C M+ = [0,+ 00 ) consisting of all the endpoints 
of the intervals occurring in the barcode. E(X) is a finite set on the non-negative real line, and there is 
consequently a total ordering on it induced from the total ordering on R + , and we write E(X) = {ei,..., e t }, 
with e* < &j whenever i < j. From the definition, it is clear that whenever we have e* < 77 < rf < e.i + -\ , the 
natural map on H t) induced by the inclusion VR(X,rj) VR(X, rf) is an isomorphism, and that therefore 
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the inclusion induces a bijection on connected components. For this reason, we call each of the intervals 
(ej,e,_|_i) a stability interval, or an S-interval for X. We now have the following definition. 


Definition 3.3 Given a point cloud X, a reference map p : X —> Z to a metric space, and a covering 
U = {U a } ae A, we define a simplicial complex SS = SS(X, p,U) as follows. The vertices of SS are pairs 
(a, I), where a G A, and where I is a stability interval for the point cloud X a = p _1 ( U a ). A (k + 1 )-tuple 
{(a 0 ,1 0 ), (ai, 7i),..., (a k , I k )} spans a k-simplex in SS if (a) U ao fl... D U ak ± 0 and (b) I 0 D ... fl I k ^ 0. 
The vertex map (a, I) —> a induces a map of simplicial complexes p : SS —► C(U). By a scale choice for X 
and U, we will mean a section of the map p, i.e. a simplicial map s : C(U) —> SS such that p° s = Idgqjy 

Given any scale choice s for X and IT, we set s(a) = ( a,I a ). Now, for any scale choice s and a G A, 
we choose e a e I a . This gives a choice of the scale parameter e varying with a, and we can build a new 
complex whose vertex set consists of pairs (a, c), where c is a cluster in the single linkage clustering applied 
to X a = p~ x U a with perimeter value e a . From the definition of the stability intervals, it is clear that the 
complex is independent of the choice of e a G I a . 

Remark: The intuition behind the definition of scale choice is the following. We wish to permit a choice 
of scale parameter e which varies with a. Of course, the set of all such choices is too large to contemplate 
using any kind of exhaustive enumeration of the possible values, and will in any case not be useful since we 
will not have any criteria to determine which choices are more plausible than others. The definition given 
above incorporates two different heuristics which permit us to restrict the choices of e a which we make, as 
well as to evaluate various choices relative to each other. 

• From the fact that the scale choice s is a simplicial map, it follows that whenever U a PI U a j , we also 
have I a fl I a /. This means that the choices of parameters e Q have a certain kind of continuity in the 
variable a, which is surely a desirable feature of a varying choice of scales. 

• The fact that the stability intervals have a notion of length allows us to evaluate scale choices. The 
general rule of thumb is that choices of scale which are stable over a large range of parameter values are 
to be preferred over those with stability over a shorter range. This permits various notions of numerical 
weights (such as, for example, K^a), where l(-) denotes length) which allow one to compare scale 
choices. 


3.5 Examples 

We show the outputs from Mapper applied to various data sets. The first example comes from a six 
dimensional data set constructed by G.M. Reaven and R.G. Miller from a diabetes study conducted at 
Stanford University in the 1970’s. 145 patients were included. Details of the study and of the construction 
of the data set can be found in [48]. Below is the output of Mapper applied to this data set, with two 
different levels of resolution. 
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The filter is in this case a density estimator, and high values are indicated in red, and low density values in 
blue. At both scales, there is a central dense core, and two “flares” consisting of points with low density. The 
core consists of normal or near-normal patients, and the two flares consist of patients with the two different 
forms of diabetes. An imaging coming from the projection pursuit method is given below. 



A second example comes from the paper [61], and consists of scanned images of handdrawn copies of the 
digit “two”. 



The images are compared using a simple L 2 -metric, and Mapper is applied using a density filter. One can 
observe that the dominant feature which is changing as one moves along the graph is the increasing presence 
of a loop in the lower left hand corner of the digit. This result is consistent with what was obtained by using 
ISOMAP in [61]. 
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The picture above is constructed using a data set constructed with the folding@home project (http://folding.stanford.edu/), 

by simulating the folding of so-called “RNA hairpins”. These are relatively small so-called motifs occurring 

within larger RNA molecules, and are among the most frequently occurring such motifs. The actual data 

set is obtained from a probabilistic simulation of the dynamics, based on the notion of a contact map. The 

contact map is simply an array of zeroes and ones, whose slots correspond to the residues along the molecule, 

and where an entry is one if the corresponding pair of residues are in contact with one another, by which we 

mean that they are within a fixed threshhold of each other. One can impose a Hamming style metric on the 

set of these contact maps. At this point, given a family of such contact maps generated by simulations, we 

can employ a filter which is a good proxy for density, and apply Mapper. The output from this application is 

the colored graph displayed above. One notes that in the middle, one has some slightly complicated behavior 

among the orange nodes, in particular a loop in the corresponding graph. The contact maps corresponding to 

members of the clusters corresponding to these nodes are displayed below the Mapper output. The contact 

maps are displayed by inserting edges between residues which are in contact. We note that given only the 

Mapper output, one might suspect that the small feature (the array of orange nodes) could simply be an 

artifact, but examination of the data shows that they correspond to essentially distinct contact maps. Note 

also that the data is obtained by simply examining the states occurring in the simulation, and that it does 

not include any dynamic information which would show how the states are traversed in the folding process. 

This example points out an advantage of the method, in that it is capable of locating small features within 
a larger data set. The results described above appear in [6]. 


4 Generalized forms of persistence 

4.1 Multidimensional persistence 

We have studied families of spaces parametrized by a single parameter e as a way of extracting connectivity 
information from a point cloud or finite metric space. It turns out that it is often useful to be able to analyze 
the behavior of increasing families of spaces parametrized by more than one variable. 

Example: Given a point cloud X, one often attempts to understand the nature of an underlying probability 
distribution which may have given rise to it. This was clearly the case in the example of the image patch 
data described in Section 2.4 above. One way to do this is to estimate the density function using one of many 
possible density estimators (see [58]). Given such an estimator, one can now construct the family of spaces 
A[T], where T is a percentage parameter, and X[T\ Cl is the subset of points which lie within the T-th 
percentile of density as measured by the given density estimator. Clearly, if T < T', we have an inclusion 
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X\T\ C X[T'}, and one is often interested in understanding the geometric evolution of this set as T increases. 
In the image patch example above, we chose a few possible values of T to locate levels for T in which we 
saw interesting topological behavior, but if one were to be able to study all the values of T simultaneously, 
and obtain a summary of the behavior across all values of T, then one would not have to search at random 
through the different values of T. In this case, we note that when we apply, for example, the Vietoris-Rips 
construction to these sets, one obtains a two parameter family of simplicial complexes {VR(X[T], e)} e ,T- 
The parameter e is used to introduce geometry into the discrete sets X[T\, and the parameter T is the 
function value we wish to track. 

Example: Persistent methods can be used to study qualitative properties of shapes which are not directly 
topological. For example, if one has a manifold, one can study the filtration on the manifold by the value of 
scalar curvature, and the evolution of the topology of the sublevel sets of this filtration can reflect interesting 
properties of the shape, and can provide the basis for methods for discriminating between shapes. In 
addition, one can build associated spaces to manifolds or other complexes by studying various versions of the 
tangent bundle or the tangent cone of geometric measure theory, which can also be equipped with interesting 
filtrations which provide interesting information about the shape, and which can also be used in locating 
features such as singularities in the space. See [10], [17], [28], and [7] for details of these lines of research. 
In order to study this kind of persistence for spaces given as point clouds, it is necessary to find discrete 
versions of the geometric quantities (such as curvature) which are relevant, but it is also necessary to use 
multiple persistence based on the geometric quantity and the scale parameter e simultaneously. As in the 
density example above, one needs the e parameter in order to impose some geometry on the discrete point 
cloud one is given. 

Example: Suppose one is interested in studying the qualitative behavior of a real valued function / on 1", 
in terms of local maxima, minima, saddle points, etc. An efficient way of doing this is to study the evolution 
of the topology of the sublevel sets Sr = {x G W l \f(x) < R}, as is done in Morse theory (see [49]). If one 
does not have the explicit form of /, but only the values of / on some grid or other sample S of points in 
R n , one can approximate the topology of the sublevel sets by the Vietoris-Rips complexes of <S fl Sr, and 
study their evolution as R increases. Of course, to extract the topology, one also needs the allow the scale 
parameter e in the Vietoris-Rips complexes to vary, and one obtains a two parameter family of simplicial 
complexes {VP(<S fl S R , e)} e R . 

It is clear from these examples that it very desirable to obtain useful and computable summaries of the 
evolution of topology in situations where there is more than one persistence parameter. A theory which 
describes how this can be done is developed in [12]. We now describe this theory. 

We recall from Definition 2.9 the notion of a P-persistence object in a category ( 7 , where V is a partially 
ordered set, as a functor P —> C, where P regards P as a category in the usual way. The morphisms of 
P-persistence objects are natural transformations of functors. Suppose we are given a family of topological 
spaces (or simplicial complexes) {V s t }, with inclusions X s t —> X s ’ f j whenever s < s' and t < t'. By 
choosing any order preserving map N x N -> R x I, we obtain a N x N-persistence object in the category of 
topological spaces (simplicial complexes). Of course, this can be carried out for more than two variables in 
the obvious way. One can now apply any of the homology functors i7,(—) to obtain an N x N-vector space. 
We recall from Section 2.3 that the category of N-persistence vector spaces is equivalent to the category of 
non-negatively graded fc[t]-modules. There is a corresponding statement concerning N s -persistence vector 
spaces. 

Definition 4.1 By an n-graded ring, we will mean a ring A together with a direct sum decomposition of 
abelian groups 




where each of the ti’s varies over N, and where the multiplication in the ring A satisfies the requirement 


Similarly, an n-graded module over an n-graded ring A is an A-module M equipped with a direct sum 
decomposition 

M = 0 

so that the requirement 

A Slt s 2 ,..;S„ • C M Sl+tli ... iSri+tn 

is satisfied. Notions of homomorphism and isomorphism of n-graded rings and modules are defined in the 
obvious ways, making the collection of n-graded A-modules into a category. 


The following proposition from [12] is now a straightforward observation. 


Proposition 4.2 The category o/N "-persistence vector spaces over a field k is equivalent to the category of 
n-multigraded modules over the polynomial ring A(n) = k[xi,X2, ■ ■ ■ , a; n ], where the multigrading structure 
on A(n) is given by A(n) tl> t 2 ,...,t„ = k ■ x^xtf ■ ■ ■■x t ". 

As is well known to algebraists, the classification of finitely generated modules over multivariable polynomial 
rings is much more complicated than the corresponding result for single variable polynomial rings, and in 
fact no reasonable parametrization is known. This situation is also valid in the graded and multigraded 
cases. The classification of finitely generated graded modules in the single variable case is parametrized by 
a set which is independent of the field in question, while examples show that in the case of more than one 
variable, the classification of multigraded modules definitely depends on the field in question. In fact, the 
classification in the multivariable case is parametrized by points in moduli varieties over the ground field, and 
we therefore say that the one variable classification is discrete while the classification in the multivariable 
case is continuous. This observation is initially disappointing, since it suggests that useful classification 
results are not likely to be available. However, it turns out that there are useful invariants, even though they 
are not complete. 


Definition 4.3 Let M be any finitely generated n-graded A(n)-module. Then for any vector 


t = {ti,t2, ■ ■ ■ ,t n ) € N" 

we define d(t) to be the dimension of the vector space Mp. Similarly, for any pair of vectors Lf € N", with 
t<V in the sense that ti < f' for all i, we define r(t, t?) to be the rank of the multiplication map 

a4 2_t2 ' ■ • x *£~ tn • : Mp —► Mp 

The assignments t —> d(t) and ( t, i?) —> r(t,P) can be regarded as N -valued functions on the sets Z n and 
V„ = e2"x Z n | t< t?} respectively. These functions are clearly invariants of the isomorphism class 

of the module M, and we refer to them as the dimension and rank invariants, respectively. 


The following result is proved in [12]. 

Proposition 4.4 In the case n= 1, the rank invariant is a complete invariant of the isomorphism class of 
a finitely generated graded F[t]-module for F a field. 
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This suggests that the computation of the rank invariant could be regarded as a suitable generalization of 
single variable persistence, even though it clearly ignores potentially interesting information. One question 
one could then ask is if there is an interesting generalization of barcodes, which could be used to understand 
the rank invariant in the same way as barcodes do for the single variable case. This can be done as follows. 

Suppose we are given an n-graded A(n) module M, and we have computed the dimension and rank invariants 
dM and r m- We say two elements s and t of V n are p-related, and write s ~ p t, if (a) o?m(s) = Tur (F) and (b) 
tm {s, t) = dM{s) = d,M (f) ■ We then let denote the equivalence relation generated by ~ p . This equivalence 
now gives a partition of the set N". When we imagine the module as arising from a multidimensional 
persistence vector space, we can imagine the various nodes in the persistence diagram (i.e. the elements of 
N n ) as embedded in M”, and we can assume that they are embedded quite densely, so that adjacent points 
are very close in the actual Euclidean space. One can now color code the regions in the partition associated 
to ~ p , and if the dimension is < 3, obtain a kind of image describing the regions of the partition. A typical 
output might look like the image below. 



Regions of constant coloring then correspond to regions in which the vector space is constant, i.e. all the 
members of a single color region have the same dimension, and further can be connected by a sequence of 
isomorphisms associated to comparable members of N n . These regions correspond to intervals of constancy 
in the barcode, i.e. intervals which contain no endpoints of any intervals occurring in the barcode. 

A difficulty which must be addressed in working with multidimensional persistence is the computational 
efficiency, and indeed setting up a viable computational framework. In the case of single variable persistence, 
we use algorithms developed for computing the Smith normal form of a matrix over a principal ideal domain. 
This machinery is not available in the multivariable case, but it turns out that it can be replaced by 
the Grobner basis methodology (see [18] or [47]). The part of that methodology which is relevant is the 
multigraded version of the notion of a Grobner basis for a submodule of a free finintely generated module, 
the Buchberger algorithm for constructing such a basis, and the algorithm for constructing syzygies attached 
to homomorphisms of free multigraded modules (Schreyer’s algorithm). These results are developed in 
[14]. The Grobner basis provides a very compact description which contains all the information about the 
multidimensional persistence problem. In particular, it permits the reconstruction of the rank invariant, 
which naively would have to be stored as sets of values for very large sets of inputs. 


4.2 Quivers and zigzags 

In Section 2.3, we developed the notion of a P-persistence object in a category C, where P is a partially 
ordered set. We then developed the theory of such persistence objects in the case P m N to define and 
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analyze persistent homology. In the previous section, we extended this development to the case V = N n , 
and saw that it permitted the study of additional kinds of problems not addressed in the case n = 1. In this 
section, we wish to develop the notion of 'P-persistence for another class of partially ordered sets, and show 
that it allows us to address some interesting classes of problems. The results of this section will appear in 
joint work with V. de Silva. We begin by listing the types of problems we wish to study. 

Example: Suppose that we are given a large data set X, and we wish to study its homological invariants 
by studying the corresponding invariants of subsamples from X. So, for example, if one wanted to estimate 
the first Betti number of a putative space X underlying X , one might build a Vietoris-Rips complex with 
a fixed e for a collection of many samples, and if sufficiently many of them compute the first Betti number 
to be n, then one might guess that the first Betti number for X is n. However, the picture below suggests 
what might go wrong with such an approach. It is a schematic picture of two different data sets, colored in 
yellow. 



Note that in the leftmost data set, the dominant qualitative picture is that of a single loop, and one can expect 
that with reasonable frequency samples many produce point clouds which capture the circular structure 
through a barcode computation, in the way illustrated by the green and red loops. In the rightmost data 
set, though, one sees many different smaller circles, and one can imagine that each of the different samples 
might compute a first Betti number of one, but where each one corresponds to a different loop, as is again 
indicated by the green and red loops. One can attempt to distinguish these by insisting that there be some 
measurable notion of compatibility between the computations. Here is one such notion. Suppose we have a 
family of samples C X , for i = 0,1,..., N. For each i, with 0 < i < N — 1, we consider also the sample 
Tj = Si U Si+i, and note that we have inclusions 5, T, and Si+ 1 Tj. This means that we actually have 
a diagram of samples from X 


T 0 Tj ••• 



So Si ••• S N - i S N 

The value of a diagram of this form is that if we are given elements x 0 G H k (VR(S 0 )) and x\ G H k (VR(Si)), 
we can obtain information relating the two classes by considering their images io{%o) and jofai) in the group 
H k (VR(To)). If we find that io(xo) = jo{xi), then this acts as confirmation that the two elements correspond 
to an element arising from the full data set X. Of course, there is no certainty arising from this, but it suggests 
the likelihood that this occurs. Note that in the example above, we will find this kind of compatibility in 
the left hand set, and not in the right. We would like to develop a systematic methodology which assesses 
the frequency of these kind of compatibilities. 

Remark: The idea of recovering information about a large data set by studying behavior of various statistics 
on subsamples is an example of the bootstrap method due to B. Efron [26]. In that context, one studies means, 
variances, and other quantities evaluated on samples, and then assesses how these statistics vary over the 
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samples. This is regarded as more informative than simply evaluating those statistics directly on the full 
data set. We can view the kind of analysis we are proposing above as a version of the bootstrap idea which 
is adapted to structural information, such as presence of loops or cluster decompositions, rather than to 
numerical values. 

Example: Suppose that we are given a data set X equipped with a map r to the real line R. For example, 
if we have data containing a time component, then a choice of r would be the time coordinate. For any 
to < ij, we let X[t 0 ,ti] denote the set {x € X\to < p(x) < t{\, and let Z(i) = X[i,i + 1] U X[i + 1 ,i + 2]. 
Then we have a diagram of point cloud data sets 


Z(t) Z(i + 1) 



X[i,i + 1] *[* + !,*+ 2] X[i + 2,i + 2,] 


We note that the “shape” of this diagram is identical to the shape of the diagram in the previous example. 
The description of the behavior of the diagram of vector spaces obtained by applying homology to the 
individual terms should be a useful way of tracking how the data behaves dynamically, assuming one can 
find a useful summary of the nature of the diagram of vector spaces. 

Example: Density estimation is an important subject in statistics, and much of what one wants to do in 
analyzing a data set is to describe in a useful way the behavior of functions which estimate density in some 
way (see [58]). One way to do this is via kernel density estimators. Suppose that we are given a data set 
X embedded in Euclidean space R n . For any point v £ K", and positive number e, we let 7 Vj(S denote a 
spherically symmetric Gaussian distribution with center at v and with variance e. Then one can construct 
the function 


as an estimate of the density of the distribution from which X arises by sampling. The resulting function 
depends on the parameter e. For large values of e, one is estimating density in a way which assigns significant 
weight to points which are far from the given point x, and for smaller values of e, one is estimating density 
where one weights much more heavily the points in a smaller neighborhood of x. If e < e', then 5 e > is a 
smoothed out version of S e . For a given X, it is an interesting question to determine which choice of e is 
“correct”, and statisticians have developed useful heuristics along these lines. Another approach, though, 
is to attempt to provide a summary of the behavior of invariants over the full range of values of e at once. 
Fixing a percentage threshhold T, we define the set X[T,e\ to be the set of points lying in the T% densest 
points as measured by the estimator S e . If we now fix a sequence of values eo < ei <■■■<€&, and set 
Zi = X[T, Cj] U X[T, q ..ij, we can now construct the following diagram of data sets. 



X[T,ei] X[T,e i+1 } X[T,e i+2 } 


If we apply the Vietoris-Rips construction with a fixed parameter value to the diagram, and then apply Hj 
for some j, we will obtain another diagram of vector spaces of the same shape as what we have been looking 



at in the earlier examples. It should then be helpful in tracking the qualitative behavior of the density 
estimator with varying e. 

Example: We recall the witness complex construction introduced in Section 2.2. Recall that this was a 
complex W W (X, £,e) constructed on the point cloud X, using a finite “landmark set” £ and a positive 
parameter e. It would clearly be informative to understand the extent to which homology calculations 
or clustering depends on the choice of landmark set £, but there is no apparent relationship between the 
complexes W W (X,£, e), even if the one landmark set is contained in the other. One can, however, proceed 
as follows. Given a point cloud X and two landmark sets £ and £', we construct a two-variable version of 
the witness complexes, denoted W w (X,£,£',e). Its vertex set is £ x £', and we declare that a collection 
{(Zo, l'o), ■ ■ •, (h, l'k)} spans a fc-simplex if and only if there exist e weak witnesses x and x' in X for the 
collections {lo,---,h} and {1' 0 ,... ,l' k }, respectively. It is roughly analogous to the Cech complex of the 
covering by intersections of Voronoi cells in two different Voronoi decompositions of a metric space. We note 
that we have projections 


W w (X,£X',e) -+ W w (X,£,z) and r(X,£,£'^w W w (X,£',e) 

induced by the vertex maps £ x £' —> £ and £ x £' —> £'. Suppose now that we have a family of landmark 
sets £i for X. We let W, : = W W (X. £, e) and U t = W W (X, £ h £ i+1 ). Then we have a diagram 


Wi Wi- 



Ui-i Ui U i+1 


Once again, one can apply homology to the diagram, and the compatibility of classes under the maps in 
this diagram should be a useful indication that the classes are intrinsic to X and do not depend on the 
choices of landmark points. This kind of compatibility information would, for example, be extremely useful 
in interpreting the neuroscience results described in Section 2.5. 

The question we now face is how to formalize the notion of compatibility in these diagrams. Here is how 
one can proceed. We first define a partial ordering on the set of integers Z. We will declare that for every 
integer k, we have 2k + 1 > 2/e and 2k + 1 > 2k + 2. Thus, every odd number is greater than its two adjacent 
even numbers, and except for identities, there are no other comparabilities. 


2k — 1 2k + 1 2k+ 3 



2k-2 2k 2k+ 2 2fc + 4 


We will denote this partially ordered set by Z, and we will denote by Z[m,n\ the subset of all integers 
i with m < i < n, where m and n are integers with m < n. We note that all the examples above are 
either ^-persistence sets or Z[rn, n]-persistence sets, and when one applies a Vietoris-Rips construction one 
obtains corresponding Z or A4(m. nj-persistence simplicial complexes. Applying H t for some i to each of 
these diagrams then gives Z or Z[m, n]-persistence vector spaces. The key ingredient in ordinary persistence 
is the observation that there is a classification of N-persistence vector spaces, and it turns out that there is a 
classification of the isomorphism classes of Z[m, n]-persistence vector spaces, which is proved and discussed 
in [29]. We will describe the structure of this classification. 
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Fix m < n. For any too < no, with to < too < no < n, we will define an elementary object E(mo,no ) as 
follows. In order to specify a Z[m, n]-persistence vector spaces, it is sufficient to specify (a) a vector space V t 
for all to < i < n and (b) linear transformations V^+i —» 14 fc and I4fc+i -+ I4fc+2 whenever the two vector 
spaces Vi involved in the linear transformation are defined. To define E = E(m 0 , no), we set Ei = k for 
mo < i < no, and Ei = {0} otherwise, and we declare that all transformations are the identity if they can 
be. If they cannot, then they involve a vector space which contains only the zero vector, and are therefore 
of necessity equal to zero . See below for a picture of £(1,4). 



The upper row consists of the vector spaces for the odd integers, and the lower row of the ones for the 
even integers. The three dots indicate an array of zero vector spaces. Now, as in Section 2.3, we have a 
notion of morphism of Z[m, n]-persistent vector spaces, and we can therefore ask for the classification up to 
isomorphism of then. Here is the theorem, which can be extracted from [29]. 


Theorem 4.5 Let M denote any Z[m,n]-persistent vector space, so that every vector space V) is finite 
dimensional. Then there is an isomorphism 

M — (J) E(rrij , hj) 

for some t and a family of pairs of integers ( mj,nj ), such that m <m,j < nj < n for all j. Moreover, this 
decomposition is essentially unique, in the sense that t is the same for all such decompositions, and further 
that the pairs ( rrij,nj ) are also unique up to a reordering of elements. 


One can use this result to obtain the following straightforward consequence. 


Corollary 4.6 We say a Z-persistence vector space M is finite if each Mi is finite dimensional and there 
exists an integer N such that Mi = {0} if |i| > N. Any finite Z-persistence vector space M can be decomposed 


M 2s 0 Efjn^nj) 

for some t and a family of pairs of integers ( m,j,nj), such that mj < nj. The decomposition is again 
essentially unique. 


Remark: Z[m, n]-persistence vector spaces are examples of quiver representations, a highly developed area 
of algebra. See [29] for a complete description. 

The families of intervals in each of these decompositions can be interpreted as barcodes with integer valued 
endpoints. We argue that the analysis of these diagrams should be useful in the analysis of the kinds of 
problems we have discussed above. We give intuitive illustrations of how this might work. We suspect that 
there are theorems which could be proved in this direction, and we hope that that will be the subject of 
future work. 
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Example: In the context of clustering samples as above, supposing that one obtains a do-barcode with two 
long lines and a family of shorter lines. This outcome would suggest the possibility that one is really looking 
at two essential clusters in the full data set, with others arising out of the sampling. We view this idea as a 
potential contributor to the study of consistency or stability of clustering [4], [43]. 

Example: In the context of variable landmark sets as above, suppose one obtains a -barcode with one 
long line. This would be confirming evidence toward the hypothesis that the original data set has a /?i of 
one, and that what one is seeing in each of the witness complexes is a reflection of that qualitative feature. 

Example: In considering dynamic data, the barcodes obtained should be useful in understanding the 
topological transitions occurring in a changing data set. They should, for example, give a guide to the 
behavior of clusters over time. 

Remark: It is interesting to note that the problems mentioned here are interesting and difficult even in the 
analysis of /3o, so that the methods we propose should give interesting new information about the behavior 
of clustering. 


4.3 Tree based persistence 

As we have seen above, algebraic topology is capable of producing signatures which indicate the presence 
of topological features within a space. As it stands, however, it is not capable of describing the source of 
the feature, i.e. where in the space the hole or other feature is located. By using persistence, one is able 
to develop a systematic way of addressing such questions. That methodology has been described in [65]. In 
this section, we describe what was done there, and also suggest another persistence framework into which it 
might fit. 

We suppose that we are given a simplicial complex E and a covering S = {E Q }ae/i of E by subcomplexes. 
For example, if we suppose that we are given a set of point cloud data X and a covering U = {U a } ae A of 
X, one could obtain a covering of VR(X, e) by the full subcomplexes of VR(X, e) on the vertex sets U a . 
The use of filters as described in Section 3 above can provide coverings of X, or one could cover using balls 
with centers distributed through the space. One might also cover X by versions of Voronoi cells, as in the 
discussion of witness complexes in Section 2.2. 


Definition 4.7 Let x £ Hi( E). We say x is iS-small if x £ im(i a : Hi( E a ) —*■ i?»(E)) for some a. 

Once one determines that a class is «S-small, and for which a the class x is in the image of i a , one has in 
effect shown that the feature at least can be represented within the given subset of E, and so has information 
about the source of the class. We recall the Mayer-Vietoris blowup construction M. (|E|,«S) from Section 3.2. 
This is a space equipped with projection maps 7 ta : Af (|E|, S) —> A [A], where A is the indexing set for the 
covering S, and ps : A4(|E|,5) —> |E|. It can be shown that ps is a homotopy equivalence (see [65]). One 
can consider the skeletal filtration (A[A]( fe )} on A [A], where A[A]( fe ) is the subspace consisting of the union 
of all faces of dimension < k, and the corresponding filtration 7r^ 1 (A[A]( fe ^) on A4(|E|,5). The following 
propositions are now easy to verify. 


Proposition 4.8 A homology class x £ Hj(|E|) is S-small if and only if x is in the image of the homomor¬ 
phism R i (7rX 1 (A[A]#Jg -» 

Proposition 4.9 If a homology class x £ iJj(|E|) is in the image of 


-£fi(|E ao | U |E ai | U • • • U |E afc |) —> Hj(|E|) 




for some (k + 1 )-fold collection of elements of S, then x is in the image of the homomorphism 

Note that this means that we have an {0,1,, #(A) — l}-persistence vector space {i? i ((7r^ 1 (A[A] 
which contains information about the where the homology classes in £ arise from. If we are asking whether 
the class arises from an individual set in S, then the persistence vector space gives us a complete answer to 
this question. If the question is instead whether or not an element arises from a union of k elements of S, 
then we do not obtain complete information this way, but we do obtain partial information in that we can 
preclude the possibility that a class arises from such a union. Examples of the application of this approach 
are given in [65]. 

We will also suggest the possibility of another approach to the question of determining the origin of homology 
classes. Consider first any rooted tree (T,v o), where vq is the root. The vertex set of T can now be given 
a partial ordering <t defined by vi <t V 2 if and only if the shortest path from V\ to vq contains v 2 - The 
properties of trees guarantee that <t is a partial ordering. Next, suppose that we have a simplicial complex 
£ with a family of coverings <S, = {£ a } ae ^. by subcomplexes, equipped with functions 0, : A t -4 A i+1 such 
that for any simplex a G £ and a G A,, we have that cr G £ a implies that a G £# 4 ( a ). We suppose that 
there is an integer N so that An consists of a single element, and that therefore the covering Sn consists 
only of £ itself. This covering data gives rise to a rooted tree whose vertex set is Uil=i an< l where the 
edges are all of the form (a,9ia), for some a G A,. The single element in A n is a root for the tree. This 
kind of family of coverings can arise in natural ways from certain coverings of complexes or spaces. 

Example: Consider the unit interval I = [0,1], and fix N. Let Si denote the covering of I given by the family 
of intervals [ 2 N~i , « ], where 0 < k < 2 N ~ l — 1 is an integer. Let A* denote {k G Z|0 < k < 2 N ~ l — 1}, 

and define 6i : Aj —» A, +1 to be the function k —> |_f J • We now have a family of coverings, which become 
increasingly “coarse” as i increases. The associated tree is a binary tree with 2 iV_1 leaves. 

Example: If our space is I n instead, we may cover by products of the sets in the coverings S l . 

Let Si = {U a } a( zA, be a family of coverings as above, and 0, : A, —> A i+1 be maps of coverings as above. 
Let (T,v o) denote the associated rooted tree, and Vt its vertex set equipped with the partial ordering < T . 
Then we define an associated (Vt, <r]-persistence vector space {W t } te v T as follows. Given a G A, : C W, we 
set W a = Hi(U a ), and whenever a <t of ', then the associated linear transformation from H(U a ) to H(U a <) 
is the map induced from the inclusion U a U a > = Ug.( a y The idea of studying the source of homology 
classes should now be rephrased in terms of invariants of (Vt, uo)-persistent vector spaces. This situation 
is a bit like the situation encountered in Section 4.1 in that the classification will involve points on positive 
dimensional varieties over the ground field. Nevertheless, it appears plausible that one can construct useful 
invariants, such as the rank invariant discussed there. 


5 Reasoning about clustering 


As we have noted above, clustering algorithms are methods which take as input a finite metric space (X, d) 
and produce as output a partition II(X, d) of the underlying set X. In [39], J. Kleinberg proves a non¬ 
existence theorem for clustering algorithms satisfying certain properties, in a spirit similar to that of the 
Arrow impossibility theorem. We will enumerate three properties which may be satisfied by clustering 
algorithms. 


1. Scale invariance: Given a finite metric space (X, d), the partitions of X associated to (X, d) and 
(X, rd), where r > 0, are identical. 
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2. Richness: Any partition of a finite set X can be realized as II(X, d) for some metric d on X. 

3. Consistency: Suppose that d and d' are two metrics on a finite set X, and suppose that (a) for 
any x,x' contained in one of the clusters attached to d (i.e. blocks in the partition Tl(X, d)) we have 
d'(x,x') < d(x, x') and (b) for any x,x' which belong to distinct clusters attached to d, we have 
d'{x,x) > d(x, x'). Then n(W,d) = U{X,d'). 


Kleinberg’s theorem is now 

Theorem 5.1 (Kleinberg, [39]) There are no clustering algorithms which satisfy scale invariance, rich¬ 
ness, and consistency. 

This interesting result is disappointing in that it does not give guidance concerning the choices of clustering 
algorithms, but rather points out deficiencies from which all clustering algorithms must suffer. It is therefore 
interesting to identify situations in which one can prove existence, and perhaps existence and uniqueness 
given certain properties. As Kleinberg points out, it is possible to do so in a number of different ways by 
specifying cost functions on particular clusterings, and prove a uniqueness results for optimal choices of 
clusterings with respect to this cost function, but that this is perhaps less interesting in that cost functions 
can often be defined which will isolate a particular algorithm. In [9], such a context is developed which is 
not dependent on the choice of a cost function but is rather on “structural” criteria which have a great deal 
in common with Kleinberg’s requirements. We now describe the main result of [9]. 

We begin with the informal observation that clustering for finite metric spaces can be thought of as the 
statistical version of the geometric notion of forming the set 7To(X) of connected components of a topological 
space X. We note that the correspondence X ^ iro{X) can actually be viewed as a functor (see [44]) from 
the category of topological spaces to the category of sets, in the sense that a continuous map / : X —> Y 
induces a map of sets 7To(/) : ffo(X) — > 7To(Y), satisfying certain obvious conditions on composite maps and 
identity maps. This observation is much more than a curiosity. It is the basis for many comparison theorems 
in topology, and in fact underlies the combinatorization of topology obtained via simplicial sets [45], [19]. 
It is also what underlies the theoretical constructions underlying the Mapper algorithm discussed in Section 
3. Finally, it is also the basis for etale homotopy theory, which adapts topological methods to the study of 
number theoretic problems. The naturality of this condition as well as its utility in many other mathematical 
contexts suggests that it is very natural to formulate such a condition for clustering algorithms as well. We 
will therefore attempt to describe clustering algorithms as functors between two categories, where the domain 
category has as its objects the collection of finite metric spaces. One must therefore first define a notion of 
morphism of finite metric spaces. We define several such notions. 

1. Isometry: An isometry from a finite metric space (X,dx) to another finite metric space ( Y,d Y ) is a 
bijective map of sets / : X —>Y so that 

d Y (f(x),f(x')) = dx(x,x') for all x,x' £ X 

2. Embeddings: An embedding from a finite metric space (X, dx) to another finite metric space (Y, d Y ) 
is a map of sets / : X —> Y" so that 

d Y (f(x),f(x')) = dx(x,x') for all x,x' £ X 


3. Monomorphisms: A monomorphism from a metric space X to another metric space Y is a monic 
set map / : X —>Y so that for all x, x 1 £ X, we have d Y (f(x),f(x')) < d(x,x'). 
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4. Distance non increasing maps: A distance non increasing (dni) map from a finite metric space 
(X,dx) to another finite metric space ( Y,dy ) is a map of sets f : X —> Y so that dy(f(x),f(x')) < 
dx(x,x') for all x,x' £ X. 

Each of these notions creates the structure of a category whose set of objects are the finite metric spaces, 
since one can readily observe that each of the classes of morphisms is closed under composition and contains 
the identity. We denote each of these categories by A4* so , A4 emb , A4 mon , and A4 9en , respectively. One could 
initially hope to study clustering algorithms as functors from each of these categories to sets. Thinking in 
these terms, though, it is first clear that not every functor on one of these categories deserves the name 
“clustering functor”. To see this, we return to the geometric notion of connected components. Not only 
is the correspondence 7To(—) a functor from spaces to sets, it is equipped with a natural surjective map of 
sets fix '■ X —> 7Tq( X j. Further, these surjective maps have the property that for every continuous map 
f : X —>Y, the diagram of sets 


TTO(X) - 


- Tro(y) 


commutes. 

Remark: In formal terms, the maps r/x, as X runs over all topological spaces, form a natural transformation 
from S to 7To(X), where S denotes the “underlying set” functor from the category of topological spaces to 
the category of sets. 

By arguing with the analogy with the connected component construction, we will begin with a provisional 
definition of a C-functorial clustering algorithm, where C is one of the above mentioned categories. It will be a 
functor % from C to the category of sets together with a family of surjective maps of sets ri(x,d) '■ X —> x(X- d) 
so that the diagrams 


commute for every morphism / : (X. dx) —> (Y. dy) in C. 

It is clear that a C-functorial clustering algorithm is a clustering algorithm in the sense of Kleinberg, since 
the surjective map of sets from X to x(X, dx) yields a partition of X, namely the partition whose blocks 
are the sets r]f^{z), as 2 ranges over all elements of x(X- dx). This means that Kleinberg’s conditions also 
make sense in this context. We examine the possible clustering functors in two of these cases. 

Example: Af ' so -f'unctorial clustering algorithms are very simple to describe. Let 1 denote the collection 
of isometry classes of finite metric spaces, and for each l let (X t , d, ) denote an element of the isomorphism 
class t. Let Let G, denote the automorphism group of (X t , d,J, and let V, denote the set of all possible 
partitions on A,. Clearly the group G, acts on the set V t , and we let Vf" denote the fixed point set of that 
action. A AT" so -functorial clustering algorithm determines a choice p, € Vf 1, for every 1, and conversely an 
arbitrary choice of such pfs determines a AT™-f'unctorial clustering algorithm. If we impose Kleinberg’s 
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scaling condition, we must instead decompose the set of all finite metric spaces into equivalence classes, 
where two metric spaces are in the same equivalence class if and only if one is isometric to a rescaling of 
the other. The classification is now determined exactly as above, except one needs only to make a choice for 
each of these new equivalence classes. 

Example: The study of general Af gera -functorial and M rnon clustering algorithms is more subtle, and we 
do not yet have a general classification. We will instead give some examples. 


• Fix a threshhold parameter e, and for a finite metric space (X,dx), we let ~ e denote the equivalence 
relation generated by the relation R e on X defined by xR e x' if and only if d{x,x') < e. Then the 
clustering algorithm which assigns to each (X,d x ) the partition associated to is clearly M gen - 
functorial. This example corresponds to single linkage clustering with a fixed parameter value e. 

• Consider the finite metric space A[n] e whose elements are {1,2,... ,n}, and where d(i,j) = e for all 
pairs 1 < i <• j < n. For any finite metric space (X, d x ) we define a new relation lZ e on X by the 
requirement that xlZ e x' if and only if there is a distance non-increasing inclusion j : A[n] e (X,dx) 
such that 1) = x and j(2) = x’. We then let E e denote the equivalence relation generated by 
1Z e . The clustering algorithm which assigns to each (X, d x ) the partition of X into the blocks of the 
equivalence relation E e is now clearly Af mo "-functorial. This notion of clustering is closely related to 
clique clustering algorithms in network clustering [53]. 

• More generally, for any family $ of finite metric spaces, one can define clustering algorithms attached 
to $ by analogy with the previous example, where the relation involves injective maps from elements 
of the family or possible arbitrary maps of finite metric spaces from the family. 

• For any finite metric space (X, d x ), define £t(X) to be the minimal non-zero distance between distinct 
points of X. The clustering algorithm which assigns to each metric space (X, d x ) the clustering 
associated to the equivalence relation generated by Rbkk is readily checked to be Af TOO "-functorial. 

It is now easy to show that if one imposes the scale invariance condition of Kleinberg, one finds that there 
are no non-trivial A4 !> e ” -functorial claustering algorithms, where the trivial ones are understood to mean 
the discrete one (with one element clusters) and the indiscrete one (in which X forms the single cluster for 
every (X,d x )). 

Non-existence results are of course interesting, but more useful from the point of view of applying and 
using clustering algorithms are situations where existence and uniqueness can be proved. The non-existence 
result mentioned in the previous example suggests that one should look for a more relaxed framework. In 
order to do this, we will change the target category for clustering algorithms from the category of sets 
to the category of R + -persistent sets, where R + denotes the non-negative real numbers. This is not an 
unreasonable thing to do in view of the fact that hierarchical clustering algorithms do not report single 
partitions but rather dendrograms, which are roughly speaking BA-persistent sets. We have already defined 
the notion of morphisms of R+-persistent objects in any category. A morphism of persistent sets is said to 
be surjective if each of the individual morphisms which make it up is surjective. For any finite metric space 
(X,d x ), we associate to it the M + -persistent set a(X) = {a(X) r } r for which a(X) r = X for all r e K + , 
and so that all the morphisms a(X) r —* a(X) r >, for r < r', are the identity morphisms on X. Then by a 
persistent clustering algorithm we will mean an assignment to every finite metric space (X, d x ) a persistent 
set £(X,d x ) and a surjective morphism a(X) —> £(X,d x ) of persistent sets for every finite metric space 
(X,d x )- Letting C be any of the category structures on the collection of finite metric spaces given above, 
one can now define a corresponding notion of C-functorial persistent clustering algorithms as follows. Such 
a clustering algorithm is a functor £ from C to the category of persistent sets, equipped with a surjective 
morphism of persistent sets r/ x ■ a(X) -A £(X, d x ) for every (X, d x ), so that the diagrams 
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<*(X) ,<f) ► a(y) 



C(X,dx) £(Y, d Y ) 


commute for every morphism / in C. In this context, Kleinberg’s scale invariance condition takes a different 
form. For any R + -persistent set U = {U r } and any positive real number t, we define the persistent set tU 
by tU r = U'- , and by requiring that for any r < r', the homomorphism tU, —* tU r > be identified with the 
corresponding homomorphism 141 —> U,' . The correspondence U —> tU is clearly a functor from the category 
of persistent sets to itself, which we will write as 8 r . We also have the rescaling functor p r from the category 
C to itself, which simply multiplies all distances by r. We now say that a persistent clustering algorithm £ 
is scale invariant if (a) x(Pr(X,dx)) = S r (x{X, dx)) and (b) f] Pr (x,d x ) = ^r(V(x,d x ))- I n [9]> the following 
result is proved. 

Theorem 5.2 Let E denote a metric space with two points, and so that the distance between those two 
points is = 1. Let P denote the persistent set for which P r consists of two points for r < 1, of one point 
for r > 1, and so that all the maps P r —» P r i are surjective for all r < r'. There is a unique M. 9en - 
functorial persistent clustering algorithm S which is scale invariant and which satisfies the requirement that 
S(U) = P. The algorithm S is the algorithm which associates to a finite metric space (X,dx) the persistent 
set{MVR(X,e))} e > 0 . 

The proof is not difficult. This result is in the spirit of Kleinberg’s result in that the requirements which 
define it are structural rather than requiring the optimization of some cost function, but yields an existence 
and uniqueness result. This theorem therefore precludes a number of interesting algorithms such as average 
linkage and complete linkage clustering from being functorial. Users of clustering algorithms often assert 
that average linkage and complete linkage clustering are to be favored over single linkage clustering because 
the clusters tend to be more “compact” in an intuitive sense. We believe that their observation should 
be interpreted as saying that in clustering one needs to take into account more than just the metric as a 
geometric object, but in addition some notion of density. This suggests the possibility of including density 
into notions of functoriality, which we will explore in a future paper. 


6 What should the theorems be? 

We have presented some examples of how topological methods can be applied to study data sets. The 
methods provide signatures which yield information about the shape of the data sets being studied, and 
can also provide useful methods for visualizing data sets. However, there are many outstanding questions 
of a statistical nature. One situation which illustrates some of these questions was that which occurred 
in the discussion of the neuroscience data in Section 2.5 above. In that case, one found that the data 
exhibited barcodes which appeared to indicate the presence of non-zero /?i and fh in the data sets. While 
the observation is of course suggestive, to prove that it indicates structure in the data set one must show 
that segments of the indicated length cannot occur in a random model associated to a null hypothesis. In 
the paper [59], this was shown by simulation, which is of course always an option. However, it is clear that 
if it were the case that such results could actually be proved, one would avoid the time and effort spent in 
simulation, and it would also provide the basis for thinking more systematically about significance questions 
for the barcodes arising out of persistent homology. We outline how one might begin formulating such results. 

Suppose we are given a metric space X and a probability measure on X. Then we can consider an experiment 
consisting of selecting n points {xi,X2, ■ ■ ■, x n } i.i.d. from X. For each time this experiment is carried out, we 
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can then form the R + -persistent simplicial complex {VR({x i, ... , x n }, e)} e >o, the corresponding homology 
vector spaces {Hi(VR({xi,.... x n }, e))} e >o, and finally therefore a barcode. We note that the barcode is 
simply a family of intervals with endpoints in R + . Each such interval can be considered as a point in the 
set D = {(a:, y) e M 2 |x < y}, and the output is therefore a finite subset of the set D. As such, we now have 
a finite spatial point process (see [20], [3]). Roughly, a finite spatial point process with values in a locally 
compact second countable Hausdorff space S is a probability measure on a cr-algebra constructed on the 
collection of finite counting measures on S. The cr-algebra is the minimal one which makes all the counting 
maps <f>B measurable, where B is in the Borel cr- algebra associated to S, and <3 >b evaluates the counting 
measure on B. Point processes are a heavily studied area within statistics. 

We believe that theorems which describe these point processes will be very useful in applying algebraic 
topology to the qualitative analysis in many areas of science. Here are some suggestions which would be 
interesting. 


• Determine the distributions of various statistics, such as maximum and average distance to the diagonal, 
on the point processes attached to the Vietoris-Rips complexes obtained by sampling sets of points 
from various probability distributions on Euclidean space. Gaussian distributions are a good place to 
start. 

• Similarly, determine distributions of various statistics on point processes obtained by selecting sets 
of landmark points using various strategies from probability distributions on Euclidean space and 
computing the persistent witness complexes. 

• Study consistency of clustering problems by studying the distributions of various statistics on the 
zig-zag barcodes obtained by sampling from a larger data set. 

• Develop a method for studying the output of multidimensional persistent homology probabilistically. 


We point out that in the context of ordinary homology (i.e. not persistent homology), M. Penrose and others 
have developed a theory of “geometric random graphs”, and have proved various results concerning the Betti 
numbers of complexes attached to such graphs (see [54]). Also, results about barcodes under the general 
heading have now begun to appear (see [15] and [16]). These results should be an excellent starting point 
for the development of theorems of the type mentioned above. 
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